Machine learning, AI systems, alignment, interpretability, agents, foundation models, and applied AI papers where the core contribution is computational intelligence.
Filter by category: Paradigm Challenge Breaks Assumption First Ever Nature Is Weird Practical Magic Cosmic Scale Life Origin Open Release Efficiency Leap New Capability Scaling Insight
Breaks Assumption
Reveals that 'reasoning' gains in fine-tuned LLMs may be artifacts of task familiarity rather than improved capability.
New Capability
MotionAnymesh automatically transforms static 3D meshes into simulation-ready, articulated digital twins for robotics using vision-language models grounded in physical priors.
Paradigm Shift
ThinkStream introduces a 'Watch-Think-Speak' paradigm for video reasoning that allows models to incrementally update understanding and decide when to respond in real-time.
Breaks Assumption
This paper presents an exact federated unlearning protocol for foundation models that is pointwise identical to centralized retraining but uses fixed-size messages.
Efficiency Breakthrough
CleanSight provides a training-free, test-time defense for backdoored vision-language models by detecting and pruning 'attention stealing' visual tokens.
Breaks Assumption
This study proves that even with a 'perfect' noise transition matrix, statistically consistent noise-correction methods still suffer from performance collapse.
Efficiency Breakthrough
Structured distillation for personalized agent memory achieves an 11x reduction in token count while preserving 96% of the retrieval quality of verbatim history.
New Capability
Multimodal OCR (MOCR) treats charts, diagrams, and tables as code-level targets (e.g., TikZ, SVG) rather than just cropping them as pixels.
Breaks Assumption
A cross-dataset study reveals that modern general-purpose vision models (GP-VMs) outperform specialized medical architectures in 2D medical image segmentation.
Paradigm Shift
Connects DDIM reverse chains to fractal geometry, providing a mathematical explanation for why diffusion models switch from global context to local detail.
Breaks Assumption
Reveals that linearized attention never converges to the NTK limit in practice, explaining its unique 'influence malleability' compared to standard networks.
Efficiency Breakthrough
Induces pretrained video models to perform SOTA image restoration using less than 2% of the training data required by specialized architectures.
Efficiency Breakthrough
Achieves 'zero-hyperparameter' circuit analysis by using a foundation model to perform in-context regression, bypassing hours of manual tuning.
Paradigm Shift
Proposes Causal Process Reward (CPR) to fix 'cherry-picking' in MLLM reasoning by coupling answer correctness with step-level logical alignment.
Efficiency Breakthrough
Introduces Bilateral Context Conditioning to DeepSeek's GRPO, allowing models to cross-reference successful and failed reasoning traces during optimization.
Efficiency Breakthrough
Enables RMSNorm to reuse MXFP8 block scales, reducing the reduction operation size by 32x with a 2.4x kernel speedup.
Breaks Assumption
Finds that privacy vulnerability and utility are both concentrated in a tiny fraction of 'critical weights' based on their location rather than value.
Breaks Assumption
STEVO-Bench reveals that current 'video world models' fail to simulate physical processes when the camera looks away or lights go out.
New Capability
Optimizes diffusion models via Direct Preference Optimization (DPO) to generate human motion that is inherently executable by real humanoid robots.
Paradigm Shift
Reimagines 3D molecules as continuous vector fields rather than discrete graphs, decoupling structure learning from atom types.
Scaling Insight
Proves the existence of a 'distributional simplicity bias' in diffusion models, where low-order statistics are learned linearly while high-order correlations require cubic sample complexity.
Paradigm Challenge
Time moving forward might just be a glitch caused by the universe being bad at copying its own homework.
Practical Magic
We’ve finally made digital messages that are physically impossible to copy—even a perfect hacker couldn't do it because physics won't allow it.
Nature Is Weird
Scientists built an AI that treats crop-raiding elephants like chess opponents to predict exactly where they’ll strike next.
Cosmic Scale
The massive satellite network the government uses is accidentally blasting out people's private passwords in plain text for anyone to see.
Open Release
OpenSanctions Pairs releases a massive benchmark for entity matching, proving that local LLMs can now match production rule-based systems in high-stakes compliance tasks.
Scaling Insight
Speculative Decoding Scaling Laws (SDSL) provides a theoretical framework to predict optimal throughput hyperparameters for LLM inference systems before pre-training.
Paradigm Shift
This paper introduces a graph tokenization framework that allows standard Transformers like BERT to beat specialized Graph Neural Networks without any architectural changes.
Efficiency Breakthrough
The first open recipe for training embodied intelligence at the 1,000-GPU scale, achieving a 40x speedup in training cycles for GR00T models.
Breaks Assumption
Routing signatures reveal that MoE experts are highly task-specific, allowing a simple linear classifier to identify task categories with 92.5% accuracy based only on routing patterns.
New Capability
A new method for training axis-aligned decision trees using gradient descent and backpropagation, allowing trees to be integrated into end-to-end neural networks.
Efficiency Breakthrough
REOPOLD achieves 10x better sample efficiency in reasoning distillation, enabling 7B models to match 32B teachers with significantly less training data.
Efficiency Breakthrough
PACED introduces a weight kernel that focuses distillation on the 'Zone of Proximal Development,' where the student's gradient signal-to-noise ratio is highest.
Paradigm Shift
Continual Representation Learning (CoRe) moves PEFT from weight-level updates to representation-space interventions, solving catastrophic forgetting in dynamic environments.
Scaling Insight
Cyber-attack capabilities of AI models scale log-linearly with inference-time compute, with no plateau in sight.
New Capability
SoLA introduces the first reversible model editing framework that allows precise revocation of specific knowledge updates.
Breaks Assumption
LLM-based user simulators create an 'easy mode' for agents that fails to capture real human frustration, ambiguity, and feedback nuances.
Breaks Assumption
Machine unlearning in LLMs is often a 'mirage' that can be bypassed using simple multi-hop reasoning or entity aliasing.
Efficiency Breakthrough
InstantHDR achieves high-quality 3D HDR reconstruction 700x faster than current optimization-based methods.
Paradigm Shift
Theoretical analysis proves that Langevin dynamics is fundamentally non-robust to score function errors, justifying the shift to Diffusion Models.
Paradigm Shift
HAPO resolves the advantage collapse problem in sparse-reward RL for reasoning models using a Thompson-sampled hindsight mechanism.
Scaling Insight
Adversarial prompt injection causes jailbreak success rates to transition from polynomial to exponential scaling with inference-time samples.
New Capability
RewardHackingAgents establishes a benchmark for evaluating whether ML-engineering agents are actually solving tasks or just tampering with the evaluation code.
Efficiency Breakthrough
TimeSqueeze achieves 20x faster convergence and 8x higher data efficiency for time-series foundation models by using dynamic, content-aware patching.
Breaks Assumption
MirrorDrift demonstrates a successful SLAM-targeted attack on production-grade 'secure' LiDARs using simple actuated mirrors rather than complex signal injection.
Breaks Assumption
An evaluation of 17 LLMs reveals a 'conversation tax' where multi-turn interactions consistently degrade diagnostic reasoning compared to single-shot prompts.
Paradigm Shift
This paper introduces Finsler geometry to manifold learning, allowing for the capture of asymmetric data relationships like density hierarchies that Riemannian methods ignore.
Breaks Assumption
Re-evaluating high-profile medical AI safety claims reveals that reported triage failures were artifacts of the 'exam-style' evaluation format rather than model incapacity.
Efficiency Breakthrough
DART enables real-time multi-class detection for open-vocabulary models like SAM3, achieving up to 25x speedup without any weight modifications.
Breaks Assumption
Softmax normalization mathematically mandates the creation of attention sinks to serve as 'null states' when models need to ignore input.