Machine unlearning in LLMs is often a 'mirage' that can be bypassed using simple multi-hop reasoning or entity aliasing.
Breaks Assumption arxiv | Mar 13
InstantHDR achieves high-quality 3D HDR reconstruction 700x faster than current optimization-based methods.
Efficiency Breakthrough arxiv | Mar 13
Theoretical analysis proves that Langevin dynamics is fundamentally non-robust to score function errors, justifying the shift to Diffusion Models.
Paradigm Shift arxiv | Mar 13
HAPO resolves the advantage collapse problem in sparse-reward RL for reasoning models using a Thompson-sampled hindsight mechanism.
Paradigm Shift arxiv | Mar 13
Adversarial prompt injection causes jailbreak success rates to transition from polynomial to exponential scaling with inference-time samples.
Scaling Insight arxiv | Mar 13
RewardHackingAgents establishes a benchmark for evaluating whether ML-engineering agents are actually solving tasks or just tampering with the evaluation code.
New Capability arxiv | Mar 13
TimeSqueeze achieves 20x faster convergence and 8x higher data efficiency for time-series foundation models by using dynamic, content-aware patching.
Efficiency Breakthrough arxiv | Mar 13
MirrorDrift demonstrates a successful SLAM-targeted attack on production-grade 'secure' LiDARs using simple actuated mirrors rather than complex signal injection.
Breaks Assumption arxiv | Mar 13
An evaluation of 17 LLMs reveals a 'conversation tax' where multi-turn interactions consistently degrade diagnostic reasoning compared to single-shot prompts.
Breaks Assumption arxiv | Mar 13
This paper introduces Finsler geometry to manifold learning, allowing for the capture of asymmetric data relationships like density hierarchies that Riemannian methods ignore.
Paradigm Shift arxiv | Mar 13
Re-evaluating high-profile medical AI safety claims reveals that reported triage failures were artifacts of the 'exam-style' evaluation format rather than model incapacity.
Breaks Assumption arxiv | Mar 13
DART enables real-time multi-class detection for open-vocabulary models like SAM3, achieving up to 25x speedup without any weight modifications.
Efficiency Breakthrough arxiv | Mar 13
Softmax normalization mathematically mandates the creation of attention sinks to serve as 'null states' when models need to ignore input.
Breaks Assumption arxiv | Mar 13
LongFlow provides an 11x throughput boost for reasoning models by specifically optimizing KV cache for long-output (vs long-input) scenarios.
Efficiency Breakthrough arxiv | Mar 13
Manifold-Optimal Guidance reformulates Classifier-Free Guidance (CFG) as a Riemannian control problem, eliminating the artifacts and saturation typical of high guidance scales.
Paradigm Shift arxiv | Mar 13
Tiny Aya is a 3.35B parameter multilingual model that achieves state-of-the-art results across 70 languages, challenging the need for massive scale in global AI.
Open Release arxiv | Mar 13
An empirical study reveals that models under 7B parameters have a fundamental utilization bottleneck that prevents them from using retrieved context effectively.
Breaks Assumption arxiv | Mar 13
Mobile-GS achieves real-time Gaussian Splatting on mobile devices by replacing the sorting-based alpha-blending bottleneck with depth-aware order-independent rendering.
Efficiency Breakthrough arxiv | Mar 13
Expert Threshold Routing (ET) replaces standard top-k token-choice with an independent thresholding mechanism, achieving 1.6x faster training convergence.
Paradigm Shift arxiv | Mar 13
RoboClaw introduces 'Entangled Action Pairs' to allow robots to autonomously collect data by learning to reset their own environment.
New Capability arxiv | Mar 13
The discovery of 'Helicoid Dynamics' identifies a critical safety failure where frontier LLMs accurately name their reasoning errors but are structurally unable to stop repeating them.
Breaks Assumption arxiv | Mar 13
Achieves 99.5% performance on Needle-In-A-Haystack benchmarks while retaining only 3% of the KV cache budget.
Efficiency Breakthrough arxiv | Mar 13
Applying Rotary Positional Embeddings (RoPE) to only 10% of hidden dimensions is sufficient for full model convergence, enabling 10x memory savings in positional caches.
Scaling Insight arxiv | Mar 13
Distills high-fidelity joint audio-visual generation into a real-time streaming model capable of 25 FPS on a single GPU.
Efficiency Breakthrough arxiv | Mar 13
Shows that simple sequential fine-tuning with LoRA outperforms complex algorithms for continual reinforcement learning in VLA models.
Breaks Assumption arxiv | Mar 13
Proves that policy gradient algorithms naturally collapse entropy and provides a mathematical fix to preserve exploration and diversity.
Breaks Assumption arxiv | Mar 13
Achieves hour-scale real-time human animation by solving the unbounded memory growth and inconsistent noise states in autoregressive diffusion.
Efficiency Breakthrough arxiv | Mar 13
Introduces the Compression-Consistency Principle, arguing that LLMs prefer truth only when false alternatives are structurally harder to compress.
Paradigm Shift arxiv | Mar 13
Replaces unstructured LLM debates with 'Deliberative Collective Intelligence,' producing formal decision packets with minority reports and accountability trails.
New Capability arxiv | Mar 13
Provides a learning-theoretic characterization of model collapse, proving exactly when replaying past outputs destroys model diversity.
Scaling Insight arxiv | Mar 13
Enables agents to autonomously discover the group structure of their environments to learn disentangled representations without human priors.
Paradigm Shift arxiv | Mar 13
Unifies leading membership inference attacks into a single framework and uses Bayesian variance inference to enable privacy auditing with 10x less compute.
Efficiency Breakthrough arxiv | Mar 13
Automates the entire robotic data generation loop, including a self-resetting mechanism that restores unstructured workspaces without human intervention.
New Capability arxiv | Mar 13
Bridges the gap between parametric CAD and direct B-Rep synthesis using LLMs and primitive grounding.
New Capability arxiv | Mar 13
Eliminates lookahead bias in financial backtesting through a series of yearly-partitioned pretrained LLMs.
Paradigm Shift arxiv | Mar 13
Recovers hidden ODE parameters from sparse data with a 487x speedup over gradient-based methods.
Efficiency Breakthrough arxiv | Mar 13
Eliminates the 2.5x latency penalty of dynamic adapters in LLMs via pre-gating and fused CUDA kernels.
Efficiency Breakthrough arxiv | Mar 13
Enables concurrent perception and reasoning for continuous video streams in Multimodal Large Language Models.
New Capability arxiv | Mar 13
Fits promptable visual segmentation (SAM) into a 1.3M parameter model for real-time in-sensor execution.
Efficiency Breakthrough arxiv | Mar 13
First framework for interpreting 4D molecular trajectories into natural language explanations.
New Capability arxiv | Mar 13
Exhaustive circuit mapping of a biological foundation model reveals massive redundancy and annotation bias.
Scaling Insight arxiv | Mar 13
Solves GNN over-squashing by using global effective resistance to identify and rewire structural bottlenecks.
Paradigm Shift arxiv | Mar 13
Cross-domain sensor model that handles variable signal lengths and resolutions without retraining.
New Capability arxiv | Mar 13
Achieves high-fidelity one-step (1 NFE) 3D robotic manipulation using training-time drifting fields.
Efficiency Breakthrough arxiv | Mar 13
Introduces the first billion-scale SAR vision foundation model and a massive unified benchmark for all-weather geospatial semantic segmentation.
Open Release arxiv | Mar 13
Demonstrates that simply using XML tags during translation outperforms complex pipelines for cross-lingual label projection while actually improving translation quality.
Breaks Assumption arxiv | Mar 13
Achieves up to 14.4x higher decoding throughput in long-context LLMs via a training-free framework that reuses sparse memory at semantic boundaries.
Efficiency Breakthrough arxiv | Mar 13
Enables multimodal agents to continually improve from experience and skills without any parameter updates through a dual-stream visual grounding framework.
New Capability arxiv | Mar 13
A 3D vision-language pipeline that grounds medical diagnosis in longitudinal brain MRI via regional volumetric assessments to eliminate VLM hallucinations.
New Capability arxiv | Mar 13
Integrates Neural ODEs with NeRFs to enable continuous-time scene dynamics that can extrapolate far beyond the original training sequence.
New Capability arxiv | Mar 13
Proposes a unified image tokenizer that reconciles the conflicting requirements of visual understanding and generation using a residual evolution process.
Paradigm Shift arxiv | Mar 13
Identifies and solves the 'information self-locking' failure mode where RL-trained agents stop asking informative questions in active reasoning tasks.
Breaks Assumption arxiv | Mar 13
A specialized distributed serving system for 'Any-to-Any' multimodal models that achieves 5.79x lower tail latency via component disaggregation.
Efficiency Breakthrough arxiv | Mar 13
Shows that LLM self-correction fails primarily due to 'session context' and can be significantly improved by moving the review to a fresh, independent session.
Breaks Assumption arxiv | Mar 13
Automates the generation of GPU-parallelized RL environments from text/code specifications, achieving up to 22,000x speedups for less than $10.
Efficiency Breakthrough arxiv | Mar 13
Establishes scaling laws for sampling compute in LLM Reinforcement Learning, providing a playbook for optimal parallel rollout and batch allocation.
Scaling Insight arxiv | Mar 13
Selects high-quality synthetic code data using 'Reverse Mutual Information' to achieve full-dataset performance with 75% less data.
Efficiency Breakthrough arxiv | Mar 13
Accelerates sparse attention by 75% by reusing lightning indexer decisions across layers, tackling the hidden bottleneck in production-grade LLMs.
Efficiency Breakthrough arxiv | Mar 13
Discovers that task-specific experts are so dense around pretrained weights that random parameter perturbations can compete with complex RL methods like PPO.
Breaks Assumption arxiv | Mar 13
Reveals that 'Reasoning LLMs-as-Judges' can lead to policies that generate highly effective adversarial outputs to deceive other judges and inflate benchmarks.
Breaks Assumption arxiv | Mar 13
Introduces a feature-matching objective for LLM fine-tuning that targets sequence-level statistics without requiring reward models or ground-truth verifiers.
Paradigm Shift arxiv | Mar 13
Integrates Chain-of-Thought reasoning directly into the Diffusion Transformer denoising process to solve complex spatial and logical tasks.
New Capability arxiv | Mar 13
Reduces visual tokens by up to 100x using an autoregressive gazing module, enabling 19x faster 4K/1000-frame video understanding.
Efficiency Breakthrough arxiv | Mar 13
Uncovers an emergent Hue-Saturation-Lightness (HSL) subspace in FLUX.1's VAE latent space, allowing for precise, training-free color control.
Breaks Assumption arxiv | Mar 13
Enables VideoLLMs to perform complex logical reasoning simultaneously with video playback without incurring the latency of standard test-time scaling.
New Capability arxiv | Mar 13
An open foundation model for humanoid robots that achieves high performance using only 30 hours of real-world robot data by pre-training on egocentric human videos.
Open Release arxiv | Mar 13
A unified streaming visual backbone that performs perception, 3D reconstruction, and robotic action simultaneously from a continuous video stream.
New Capability arxiv | Mar 13
Introduces adaptive video tokenization that allocates tokens based on scene complexity, reducing token usage by 24% while improving reconstruction quality.
Efficiency Breakthrough arxiv | Mar 13
Demonstrates that the stochasticity in standard regularized model training (like cross-validation) can serve as a 'free' and effective exploration strategy for contextual bandits.
Paradigm Shift arxiv | Mar 13