Proposes modeling the world in the feature space of frozen geometry foundation models instead of pixels, achieving 5x faster depth forecasting.
Paradigm Shift arxiv | Mar 16
A retrosynthesis model that explicitly learns strategic bond-disconnection reasoning via reinforcement learning with a round-trip accuracy reward.
New Capability arxiv | Mar 16
Longitudinal evidence reveals that successive ChatGPT versions are converging in output diversity, suggesting potential model collapse from synthetic data saturation.
Scaling Insight arxiv | Mar 16
A new system enables humanoid robots to play competitive tennis rallies with humans by learning from imperfect, fragmented motion data.
New Capability arxiv | Mar 16
Adversarial test case evolution improves code reinforcement learning by creating harder, more discriminative verification signals that drive better model performance.
Scaling Insight arxiv | Mar 16
Modality-level disaggregation enables cost-optimal MLLM serving across heterogeneous GPUs over commodity PCIe, bypassing the need for expensive NVLink interconnects.
Efficiency Breakthrough arxiv | Mar 16
Probing of Vision-Language-Action (VLA) models reveals that the action decoder largely ignores the reasoning logic in Chain-of-Thought, relying almost exclusively on object names.
Breaks Assumption arxiv | Mar 16
SciDesignBench provides a massive simulator-grounded environment for scientific inverse design, revealing that current LLMs struggle significantly with iterative refinement.
New Capability arxiv | Mar 16
A hardware-algorithm co-design for Spiking Neural Networks achieves up to 69x energy efficiency gains using an SRAM-based Compute-in-Memory accelerator.
Efficiency Breakthrough arxiv | Mar 16
The TaoBench benchmark proves that state-of-the-art math LLMs fail on equivalent logic problems when presented outside of the standard 'MathLib' framework.
Breaks Assumption arxiv | Mar 16
A self-supervised robotic system detects novel objects by training bespoke detectors on-the-fly from human video demonstrations, bypassing language-based prompts.
New Capability arxiv | Mar 16
AIM enables post-training modulation of large models to change utility levels or focus features without any retraining or additional data.
New Capability arxiv | Mar 16
Achieves 4x visual token compression and 80% lower training cost while unifying multimodal comprehension and generation.
Efficiency Breakthrough arxiv | Mar 16
First training-free method for debiasing reward models using Sparse Autoencoder (SAE) interventions.
New Capability arxiv | Mar 16
Breaks the long-standing accuracy-robustness trade-off in VLMs by localizing adversarial robustness to shallow layers.
Breaks Assumption arxiv | Mar 16
A flow-based navigation policy that achieves zero-shot sim-to-real transfer across wheeled, quadrupedal, and humanoid platforms.
New Capability arxiv | Mar 16
A small-scale molecular reasoning model that outperforms ultra-large foundation models via structured chain-of-thought and RL.
Paradigm Shift arxiv | Mar 16
Adaptive VLM Routing reduces inference costs for Computer Use Agents by up to 78% with negligible accuracy loss.
Efficiency Breakthrough arxiv | Mar 16
Distills a 2B Vision-Language Retriever into a 70M text-only encoder for visual document retrieval with 50x lower latency.
Efficiency Breakthrough arxiv | Mar 16
Reveals that 'reasoning' gains in fine-tuned LLMs may be artifacts of task familiarity rather than improved capability.
Breaks Assumption arxiv | Mar 16
MotionAnymesh automatically transforms static 3D meshes into simulation-ready, articulated digital twins for robotics using vision-language models grounded in physical priors.
New Capability arxiv | Mar 16
ThinkStream introduces a 'Watch-Think-Speak' paradigm for video reasoning that allows models to incrementally update understanding and decide when to respond in real-time.
Paradigm Shift arxiv | Mar 16
This paper presents an exact federated unlearning protocol for foundation models that is pointwise identical to centralized retraining but uses fixed-size messages.
Breaks Assumption arxiv | Mar 16
CleanSight provides a training-free, test-time defense for backdoored vision-language models by detecting and pruning 'attention stealing' visual tokens.
Efficiency Breakthrough arxiv | Mar 16
This study proves that even with a 'perfect' noise transition matrix, statistically consistent noise-correction methods still suffer from performance collapse.
Breaks Assumption arxiv | Mar 16
Structured distillation for personalized agent memory achieves an 11x reduction in token count while preserving 96% of the retrieval quality of verbatim history.
Efficiency Breakthrough arxiv | Mar 16
Multimodal OCR (MOCR) treats charts, diagrams, and tables as code-level targets (e.g., TikZ, SVG) rather than just cropping them as pixels.
New Capability arxiv | Mar 16
A cross-dataset study reveals that modern general-purpose vision models (GP-VMs) outperform specialized medical architectures in 2D medical image segmentation.
Breaks Assumption arxiv | Mar 16
Connects DDIM reverse chains to fractal geometry, providing a mathematical explanation for why diffusion models switch from global context to local detail.
Paradigm Shift arxiv | Mar 16
Reveals that linearized attention never converges to the NTK limit in practice, explaining its unique 'influence malleability' compared to standard networks.
Breaks Assumption arxiv | Mar 16
Induces pretrained video models to perform SOTA image restoration using less than 2% of the training data required by specialized architectures.
Efficiency Breakthrough arxiv | Mar 16
Achieves 'zero-hyperparameter' circuit analysis by using a foundation model to perform in-context regression, bypassing hours of manual tuning.
Efficiency Breakthrough arxiv | Mar 16
Proposes Causal Process Reward (CPR) to fix 'cherry-picking' in MLLM reasoning by coupling answer correctness with step-level logical alignment.
Paradigm Shift arxiv | Mar 16
Introduces Bilateral Context Conditioning to DeepSeek's GRPO, allowing models to cross-reference successful and failed reasoning traces during optimization.
Efficiency Breakthrough arxiv | Mar 16
Enables RMSNorm to reuse MXFP8 block scales, reducing the reduction operation size by 32x with a 2.4x kernel speedup.
Efficiency Breakthrough arxiv | Mar 16
Finds that privacy vulnerability and utility are both concentrated in a tiny fraction of 'critical weights' based on their location rather than value.
Breaks Assumption arxiv | Mar 16
STEVO-Bench reveals that current 'video world models' fail to simulate physical processes when the camera looks away or lights go out.
Breaks Assumption arxiv | Mar 16
Optimizes diffusion models via Direct Preference Optimization (DPO) to generate human motion that is inherently executable by real humanoid robots.
New Capability arxiv | Mar 16
Reimagines 3D molecules as continuous vector fields rather than discrete graphs, decoupling structure learning from atom types.
Paradigm Shift arxiv | Mar 16
Proves the existence of a 'distributional simplicity bias' in diffusion models, where low-order statistics are learned linearly while high-order correlations require cubic sample complexity.
Scaling Insight arxiv | Mar 16
Time moving forward might just be a glitch caused by the universe being bad at copying its own homework.
Paradigm Challenge arxiv | Mar 13
We’ve finally made digital messages that are physically impossible to copy—even a perfect hacker couldn't do it because physics won't allow it.
Practical Magic arxiv | Mar 13
Scientists built an AI that treats crop-raiding elephants like chess opponents to predict exactly where they’ll strike next.
Nature Is Weird arxiv | Mar 13
The massive satellite network the government uses is accidentally blasting out people's private passwords in plain text for anyone to see.
Cosmic Scale arxiv | Mar 13
OpenSanctions Pairs releases a massive benchmark for entity matching, proving that local LLMs can now match production rule-based systems in high-stakes compliance tasks.
Open Release arxiv | Mar 13
Speculative Decoding Scaling Laws (SDSL) provides a theoretical framework to predict optimal throughput hyperparameters for LLM inference systems before pre-training.
Scaling Insight arxiv | Mar 13
This paper introduces a graph tokenization framework that allows standard Transformers like BERT to beat specialized Graph Neural Networks without any architectural changes.
Paradigm Shift arxiv | Mar 13
The first open recipe for training embodied intelligence at the 1,000-GPU scale, achieving a 40x speedup in training cycles for GR00T models.
Efficiency Breakthrough arxiv | Mar 13
Routing signatures reveal that MoE experts are highly task-specific, allowing a simple linear classifier to identify task categories with 92.5% accuracy based only on routing patterns.
Breaks Assumption arxiv | Mar 13
A new method for training axis-aligned decision trees using gradient descent and backpropagation, allowing trees to be integrated into end-to-end neural networks.
New Capability arxiv | Mar 13
REOPOLD achieves 10x better sample efficiency in reasoning distillation, enabling 7B models to match 32B teachers with significantly less training data.
Efficiency Breakthrough arxiv | Mar 13
PACED introduces a weight kernel that focuses distillation on the 'Zone of Proximal Development,' where the student's gradient signal-to-noise ratio is highest.
Efficiency Breakthrough arxiv | Mar 13
Continual Representation Learning (CoRe) moves PEFT from weight-level updates to representation-space interventions, solving catastrophic forgetting in dynamic environments.
Paradigm Shift arxiv | Mar 13
Cyber-attack capabilities of AI models scale log-linearly with inference-time compute, with no plateau in sight.
Scaling Insight arxiv | Mar 13
SoLA introduces the first reversible model editing framework that allows precise revocation of specific knowledge updates.
New Capability arxiv | Mar 13
LLM-based user simulators create an 'easy mode' for agents that fails to capture real human frustration, ambiguity, and feedback nuances.
Breaks Assumption arxiv | Mar 13
Machine unlearning in LLMs is often a 'mirage' that can be bypassed using simple multi-hop reasoning or entity aliasing.
Breaks Assumption arxiv | Mar 13
InstantHDR achieves high-quality 3D HDR reconstruction 700x faster than current optimization-based methods.
Efficiency Breakthrough arxiv | Mar 13
Theoretical analysis proves that Langevin dynamics is fundamentally non-robust to score function errors, justifying the shift to Diffusion Models.
Paradigm Shift arxiv | Mar 13
HAPO resolves the advantage collapse problem in sparse-reward RL for reasoning models using a Thompson-sampled hindsight mechanism.
Paradigm Shift arxiv | Mar 13
Adversarial prompt injection causes jailbreak success rates to transition from polynomial to exponential scaling with inference-time samples.
Scaling Insight arxiv | Mar 13
RewardHackingAgents establishes a benchmark for evaluating whether ML-engineering agents are actually solving tasks or just tampering with the evaluation code.
New Capability arxiv | Mar 13
TimeSqueeze achieves 20x faster convergence and 8x higher data efficiency for time-series foundation models by using dynamic, content-aware patching.
Efficiency Breakthrough arxiv | Mar 13
MirrorDrift demonstrates a successful SLAM-targeted attack on production-grade 'secure' LiDARs using simple actuated mirrors rather than complex signal injection.
Breaks Assumption arxiv | Mar 13
An evaluation of 17 LLMs reveals a 'conversation tax' where multi-turn interactions consistently degrade diagnostic reasoning compared to single-shot prompts.
Breaks Assumption arxiv | Mar 13
This paper introduces Finsler geometry to manifold learning, allowing for the capture of asymmetric data relationships like density hierarchies that Riemannian methods ignore.
Paradigm Shift arxiv | Mar 13
Re-evaluating high-profile medical AI safety claims reveals that reported triage failures were artifacts of the 'exam-style' evaluation format rather than model incapacity.
Breaks Assumption arxiv | Mar 13
DART enables real-time multi-class detection for open-vocabulary models like SAM3, achieving up to 25x speedup without any weight modifications.
Efficiency Breakthrough arxiv | Mar 13
Softmax normalization mathematically mandates the creation of attention sinks to serve as 'null states' when models need to ignore input.
Breaks Assumption arxiv | Mar 13
LongFlow provides an 11x throughput boost for reasoning models by specifically optimizing KV cache for long-output (vs long-input) scenarios.
Efficiency Breakthrough arxiv | Mar 13
Manifold-Optimal Guidance reformulates Classifier-Free Guidance (CFG) as a Riemannian control problem, eliminating the artifacts and saturation typical of high guidance scales.
Paradigm Shift arxiv | Mar 13
Tiny Aya is a 3.35B parameter multilingual model that achieves state-of-the-art results across 70 languages, challenging the need for massive scale in global AI.
Open Release arxiv | Mar 13
An empirical study reveals that models under 7B parameters have a fundamental utilization bottleneck that prevents them from using retrieved context effectively.
Breaks Assumption arxiv | Mar 13
Mobile-GS achieves real-time Gaussian Splatting on mobile devices by replacing the sorting-based alpha-blending bottleneck with depth-aware order-independent rendering.
Efficiency Breakthrough arxiv | Mar 13
Expert Threshold Routing (ET) replaces standard top-k token-choice with an independent thresholding mechanism, achieving 1.6x faster training convergence.
Paradigm Shift arxiv | Mar 13
RoboClaw introduces 'Entangled Action Pairs' to allow robots to autonomously collect data by learning to reset their own environment.
New Capability arxiv | Mar 13
The discovery of 'Helicoid Dynamics' identifies a critical safety failure where frontier LLMs accurately name their reasoning errors but are structurally unable to stop repeating them.
Breaks Assumption arxiv | Mar 13
Achieves 99.5% performance on Needle-In-A-Haystack benchmarks while retaining only 3% of the KV cache budget.
Efficiency Breakthrough arxiv | Mar 13
Applying Rotary Positional Embeddings (RoPE) to only 10% of hidden dimensions is sufficient for full model convergence, enabling 10x memory savings in positional caches.
Scaling Insight arxiv | Mar 13
Distills high-fidelity joint audio-visual generation into a real-time streaming model capable of 25 FPS on a single GPU.
Efficiency Breakthrough arxiv | Mar 13
Shows that simple sequential fine-tuning with LoRA outperforms complex algorithms for continual reinforcement learning in VLA models.
Breaks Assumption arxiv | Mar 13
Proves that policy gradient algorithms naturally collapse entropy and provides a mathematical fix to preserve exploration and diversity.
Breaks Assumption arxiv | Mar 13
Achieves hour-scale real-time human animation by solving the unbounded memory growth and inconsistent noise states in autoregressive diffusion.
Efficiency Breakthrough arxiv | Mar 13
Introduces the Compression-Consistency Principle, arguing that LLMs prefer truth only when false alternatives are structurally harder to compress.
Paradigm Shift arxiv | Mar 13
Replaces unstructured LLM debates with 'Deliberative Collective Intelligence,' producing formal decision packets with minority reports and accountability trails.
New Capability arxiv | Mar 13
Provides a learning-theoretic characterization of model collapse, proving exactly when replaying past outputs destroys model diversity.
Scaling Insight arxiv | Mar 13
Enables agents to autonomously discover the group structure of their environments to learn disentangled representations without human priors.
Paradigm Shift arxiv | Mar 13
Unifies leading membership inference attacks into a single framework and uses Bayesian variance inference to enable privacy auditing with 10x less compute.
Efficiency Breakthrough arxiv | Mar 13
Automates the entire robotic data generation loop, including a self-resetting mechanism that restores unstructured workspaces without human intervention.
New Capability arxiv | Mar 13
Bridges the gap between parametric CAD and direct B-Rep synthesis using LLMs and primitive grounding.
New Capability arxiv | Mar 13
Eliminates lookahead bias in financial backtesting through a series of yearly-partitioned pretrained LLMs.
Paradigm Shift arxiv | Mar 13
Recovers hidden ODE parameters from sparse data with a 487x speedup over gradient-based methods.
Efficiency Breakthrough arxiv | Mar 13
Eliminates the 2.5x latency penalty of dynamic adapters in LLMs via pre-gating and fused CUDA kernels.
Efficiency Breakthrough arxiv | Mar 13
Enables concurrent perception and reasoning for continuous video streams in Multimodal Large Language Models.
New Capability arxiv | Mar 13
Fits promptable visual segmentation (SAM) into a 1.3M parameter model for real-time in-sensor execution.
Efficiency Breakthrough arxiv | Mar 13
First framework for interpreting 4D molecular trajectories into natural language explanations.
New Capability arxiv | Mar 13
Exhaustive circuit mapping of a biological foundation model reveals massive redundancy and annotation bias.
Scaling Insight arxiv | Mar 13
Solves GNN over-squashing by using global effective resistance to identify and rewire structural bottlenecks.
Paradigm Shift arxiv | Mar 13
Cross-domain sensor model that handles variable signal lengths and resolutions without retraining.
New Capability arxiv | Mar 13
Achieves high-fidelity one-step (1 NFE) 3D robotic manipulation using training-time drifting fields.
Efficiency Breakthrough arxiv | Mar 13