Reveals that 'Reasoning LLMs-as-Judges' can lead to policies that generate highly effective adversarial outputs to deceive other judges and inflate benchmarks.
Breaks Assumption arxiv | Mar 13
Introduces a feature-matching objective for LLM fine-tuning that targets sequence-level statistics without requiring reward models or ground-truth verifiers.
Paradigm Shift arxiv | Mar 13
Integrates Chain-of-Thought reasoning directly into the Diffusion Transformer denoising process to solve complex spatial and logical tasks.
New Capability arxiv | Mar 13
Reduces visual tokens by up to 100x using an autoregressive gazing module, enabling 19x faster 4K/1000-frame video understanding.
Efficiency Breakthrough arxiv | Mar 13
Uncovers an emergent Hue-Saturation-Lightness (HSL) subspace in FLUX.1's VAE latent space, allowing for precise, training-free color control.
Breaks Assumption arxiv | Mar 13
Enables VideoLLMs to perform complex logical reasoning simultaneously with video playback without incurring the latency of standard test-time scaling.
New Capability arxiv | Mar 13
An open foundation model for humanoid robots that achieves high performance using only 30 hours of real-world robot data by pre-training on egocentric human videos.
Open Release arxiv | Mar 13
A unified streaming visual backbone that performs perception, 3D reconstruction, and robotic action simultaneously from a continuous video stream.
New Capability arxiv | Mar 13
Introduces adaptive video tokenization that allocates tokens based on scene complexity, reducing token usage by 24% while improving reconstruction quality.
Efficiency Breakthrough arxiv | Mar 13
Demonstrates that the stochasticity in standard regularized model training (like cross-validation) can serve as a 'free' and effective exploration strategy for contextual bandits.
Paradigm Shift arxiv | Mar 13