Machine learning, AI systems, alignment, interpretability, agents, foundation models, and applied AI papers where the core contribution is computational intelligence.
Filter by category: Paradigm Challenge Breaks Assumption First Ever Nature Is Weird Practical Magic Cosmic Scale Life Origin Open Release Efficiency Leap New Capability Scaling Insight
New Capability
Proposes URDF-Anything+, an autoregressive framework that generates fully executable articulated 3D models from raw visual observations.
New Capability
Introduces the first system capable of imaging high-speed, non-rigid objects through strong atmospheric turbulence at 16,000 pixels per second.
Paradigm Shift
Enhances mathematical reasoning in LLMs by integrating Group Relative Policy Optimization (GRPO) with a specific reflection reward mechanism.
Efficiency Breakthrough
Reveals that Graph-RAG performance is limited by reasoning failure rather than retrieval, and shows how to make an 8B model match a 70B baseline.
Efficiency Breakthrough
Amortizes iterative diffusion into a one-step trajectory policy for robotics using a novel 'Keyed Drift Field' objective.
Efficiency Breakthrough
Proposes a temporal mixed-precision framework for diffusion models that adaptively assigns bitwidths across different denoising timesteps.
Breaks Assumption
Identifies a structural flaw in the standard Expected Calibration Error (ECE) when applied to soft labels and introduces SMECE to fix it.
Efficiency Breakthrough
Accelerates LLM inference by up to 1.8x using a training-free sparse pattern predictor based on SVD truncation of FFN gate matrices.
Scaling Insight
Challenges the monotonic 'bigger is better' scaling paradigm by proving that institutional fitness peaks at an environment-dependent scale.
Paradigm Shift
Introduces Centered Reward Distillation (CRD) to stabilize diffusion reinforcement learning by removing intractable normalizing constants.
Breaks Assumption
Demonstrates that gated predictive autoencoders can match or outperform JEPA-style architectures by learning to select predictable components.
Efficiency Breakthrough
Unifies KV cache compression and sparse attention into a single 1-bit indexing structure, eliminating the need for external metadata or predictors.
New Capability
Enables online, incremental 3D Gaussian Splatting for thousands of frames by replacing global reprocessing with a causal, streaming update framework.
Efficiency Breakthrough
Detects diffusion-generated images 126x faster than reconstruction-based methods by using Gaussian noise disturbance to exploit the statistical 'ease' of fitting synthetic data.
Breaks Assumption
Identifies that extended reasoning in Multimodal LLMs causes 'attention dispersion,' where models literally lose focus on visual inputs as the reasoning chain lengthens.
Efficiency Breakthrough
Enables model adaptation on edge devices and non-differentiable (quantized) models using a purely backpropagation-free optimization framework.
Breaks Assumption
Discovers that frozen video diffusion models already encode physical plausibility in their features, allowing for cost-effective inference-time physics filtering.
New Capability
Introduces a decentralized, multi-agent framework for scientific discovery that uses an 'ArtifactReactor' for plannerless coordination and full computational lineage.
Scaling Insight
Proposes spectral clipping to stabilize LLM training by addressing 'spectral spikes' in stochastic gradient noise that adaptive optimizers like AdamW fail to handle.
Efficiency Breakthrough
Achieves real-time, low-latency talking avatar generation at 34ms per frame using a one-step streaming diffusion framework.
Scaling Insight
Introduces Matrix-to-Matrix RNNs (M$^2$RNN) with matrix-valued hidden states that outperform hybrid Transformers while using 3x smaller state sizes.
Paradigm Shift
Proposes the 'Theory Compiler,' a system that automatically translates formal domain specifications into neural architectures with built-in physical or logical constraints.
New Capability
Introduces 'Visual Chronometer' to estimate physical frame rates directly from visual dynamics, addressing the 'chronometric hallucinations' common in generative video models.
New Capability
Segment Anything Reasoner (StAR) successfully introduces parallel test-time scaling to visual segmentation tasks, eliciting latent reasoning capabilities from base models.
Breaks Assumption
Argues that probability gradients are superior to standard log-probability gradients for RL training, proposing a new optimization method (DGPO) to solve divergence in soft clipping.
Paradigm Shift
Presents DataEvolve, a framework that enables AI to autonomously evolve and iteratively optimize pretraining data curation strategies.
Efficiency Breakthrough
Introduces ZoomUI, a trainless method for GUI grounding that uses inference-time scaling to anchor natural language instructions to interface elements.
Efficiency Breakthrough
FLORE achieves 1000x error reduction in linear sketching while being 100x faster than previous learning-based solutions.
New Capability
V-JEPA 2.1 unlocks dense, spatially structured features in video self-supervised learning, yielding massive gains in robotic manipulation and navigation.
Paradigm Shift
This paper provides a new identifiability theorem for causal representation learning to uncover physical system parameters from raw data without predefined libraries.
Scaling Insight
The Infinite Problem Generator (IPG) uses executable code to synthesize and verify 100% accurate physics reasoning data, overcoming LLM hallucination in data scaling.
Breaks Assumption
Simple regularization and data-hybrid training are shown to be sufficient to prevent catastrophic forgetting in MLLMs, challenging the need for complex anti-forgetting architectures.
Efficiency Breakthrough
SleepGate introduces a biologically inspired 'sleep cycle' for the KV cache to resolve proactive interference in long-context LLMs.
New Capability
One-Policy-Fits-All (OPFA) learns a single manipulation policy across 11 different embodiments, including grippers and dexterous hands, using geometry-aware action latents.
New Capability
Interp3R is the first method to estimate depth and camera poses at arbitrary time instants by interpolating pointmaps using asynchronous event data.
Breaks Assumption
Distilled VAE encoders are found to perform significantly better on higher, unseen resolutions than on their native training resolution.
Efficiency Breakthrough
ASAP reduces LVLM computational FLOPs by ~80% with virtually no loss in performance using a training-free KV-Cache pruning recipe.
New Capability
MorFiC achieves zero-shot locomotion transfer across quadrupeds of different sizes and masses with up to 5x speed gains over standard baselines.
Paradigm Shift
Top-b sampling introduces entropy-aware adaptive bandwidth for LLM decoding, effectively approximating a self-regulating control system for generation.
Paradigm Shift
SuperLocalMemory V3 establishes information-geometric foundations for agent memory, enabling high-accuracy retrieval without cloud-based LLM dependency.
Efficiency Breakthrough
FlashHead is a drop-in replacement for the LM classification head that provides 1.75x inference speedup by treating vocabulary selection as a retrieval problem.
Paradigm Shift
Introduces 'Delight' to policy gradients, weighting updates by the product of advantage and action surprisal to fix pathologies in RL training.
Scaling Insight
Determines the optimal compute distribution for retrieval agents, showing that re-ranking depth is far more critical than query expansion strength.
Paradigm Shift
Proposes the Spectrum Matching Hypothesis to explain why some VAE latents are 'undiffusable' and introduces techniques to align power spectral densities for superior image generation.
New Capability
Discovers interpretable 'atoms' of model behavior by decomposing training gradients, enabling unsupervised discovery and steering of complex behaviors like refusal or arithmetic.
Paradigm Shift
Introduces RenderMem, a spatial memory system that treats rendering as a query interface for embodied agents to reason about 3D geometry and occlusion.
Breaks Assumption
Reveals that larger language models are significantly better at concealing knowledge during audits, with detection traces vanishing beyond 70 billion parameters.
New Capability
Achieves pose-free 3D Gaussian Splatting using only event streams, enabling reconstruction in extreme lighting and high-speed motion scenarios.
Efficiency Breakthrough
Reformulates diffusion sampling as a graph-theoretic planning problem that dynamically allocates compute to the most difficult denoising stages.
Breaks Assumption
Formalizes the 'Visual Confused Deputy' attack, where agents are tricked into authorizing privileged actions via slight visual screen manipulations.