Machine learning, AI systems, alignment, interpretability, agents, foundation models, and applied AI papers where the core contribution is computational intelligence.
Filter by category: Paradigm Challenge Breaks Assumption First Ever Nature Is Weird Practical Magic Cosmic Scale Life Origin Open Release Efficiency Leap New Capability Scaling Insight
Breaks Assumption
Challenges a core constraint in statistical learning theory by proving that optimal $\sqrt{N}$ convergence is achievable for offline policy learning even with model classes that exceed the standard Donsker complexity limit.
Nature Is Weird
AI has hit a wall, and it's because data is acting like a heavy anchor slowing the whole thing down.
Practical Magic
This new math trick just crushed a massive logistics nightmare that used to take two weeks; now it’s done in 19 minutes.
Paradigm Challenge
Computers have gotten so fast at finding the best route on a map that it basically costs them zero effort now, no matter how big the city.
First Ever
Someone finally built computer memory that doesn't go blank when you pull the plug—it just stays there forever.
Nature Is Weird
Turns out, putting a cheap AI under an AI 'boss' actually makes the work worse unless the boss is way, way smarter than the worker.
Practical Magic
AI agents are finding multi-million dollar holes in bank code that even the best human experts completely walked past.
Efficiency Breakthrough
Prunes 85% of visual tokens in Vision-Language-Action (VLA) models while retaining 94% accuracy for autonomous driving.
Paradigm Shift
Introduces a CNN architecture where feature maps are mathematically identical to Grad-CAM saliency maps by design, rather than post-hoc.
Open Release
Releases weights for LEMON, a foundation model for single-cell nuclear morphology trained on millions of pathology images.
New Capability
A decentralized system that automates ML research and trains domain-expert 1.58-bit ternary models for CPU-native inference.
Efficiency Breakthrough
Extracts dense 3D Signed Distance Fields from images in under 3 seconds using feed-forward geometry transformer latents.
Scaling Insight
Uses the Minimum Description Length principle to predict exactly when neural networks will transition from simple 'spurious' shortcuts to complex features.
New Capability
Modulates LLM hidden states with eye-gaze data to outperform GPT-4o by 10.5 points on streaming video understanding.
Breaks Assumption
Proves that safety probes can detect 'liars' (models hiding harm) but are fundamentally blind to 'fanatics' (models that believe harm is good).
Efficiency Breakthrough
Parallelizes diffusion model sampling across multiple devices using a draft-and-refine process for up to 3.7x speedups.
Paradigm Shift
Shifts world model evaluation from visual fidelity to 'Simulative Reasoning,' revealing a massive gap in current AI's ability to plan.
Paradigm Shift
Learns high-level symbolic state machines directly from raw pixels to guide robot control without hand-crafted priors.
Breaks Assumption
Resolves a long-standing open problem in bandit theory by achieving optimal dynamic regret without knowing the number of environment switches.
Efficiency Breakthrough
Introduces a discrete-ratio selector for context compression that solves the problem of variable information density in long-form text.
New Capability
Fixes physically impossible video generation by disentangling semantic prompts from physical dynamics during training.
Efficiency Breakthrough
Achieves state-of-the-art video understanding without the need for expensive human-annotated Chain-of-Thought (CoT) data.
Breaks Assumption
Proves that standard 'wisdom' like Chain-of-Thought and Few-Shot prompting actually degrades performance in specialized medical LLMs.
Open Release
The first large-scale benchmark for LLM agents based on years of authentic, cross-domain user behavioral data rather than synthetic personas.
Paradigm Shift
Demonstrates that symbolic event primitives (like Schank's Conceptual Dependency) can be 'rediscovered' by neural networks purely through compression pressure.
Efficiency Breakthrough
Releases a composable, Optax-native stack that makes high-overhead second-order optimization methods (like K-FAC) practical and swappable.
Scaling Insight
A billion-scale time-series benchmark that identifies a 'context-length crossover' where foundation models start to crush deep learning baselines.
Efficiency Breakthrough
Introduces a self-driven collaboration paradigm where an agent uses its own 'reflection' signals to escalate difficult tasks to a stronger model tier.
Scaling Insight
Challenges the assumption that 'background' pixels are useless in GUI agents and identifies a 'recency effect' for optimal token pruning.
Paradigm Shift
Identifies specific hidden-state dimensions (H-Nodes) responsible for hallucinations and introduces a real-time defense to cancel them.
New Capability
Integrates radiologist gaze data as a probabilistic prior to align vision-language models with actual human clinical reasoning workflows.
Paradigm Shift
Moves industrial recommendation systems from static multi-stage pipelines to self-evolving agentic loops.
Breaks Assumption
Finds that while frontier LLMs can model the mental states of others, they fundamentally fail at self-modeling without explicit reasoning steps.
New Capability
Introduces ReinPatch, the first framework to jointly optimize sequence tokenization and backbone models using reinforcement learning.
Breaks Assumption
Discovers that object-centric information in Vision Transformers is distributed across all attention components (q, k, v) and layers, not just the final layer.
Open Release
Releases DataFlex, a unified open-source framework for data-centric dynamic training (selection, mixture, and reweighting) for LLMs.
Breaks Assumption
Proves that image denoisers can be strictly contractive (robust to noise) without sacrificing state-of-the-art restoration quality.
Paradigm Shift
Empirically proves that AI Scientist agents can genuinely learn from physical experimental feedback via in-context learning.
New Capability
Moves coding agents from passive execution to proactive collaboration by teaching them when to ask for clarification on underspecified tasks.
New Capability
Provides mechanistic evidence that LLMs internalize 'vibes' (informal registers like slang) as language-agnostic abstractions that can be causally steered.
New Capability
Enables GUI agents to overcome domain bias by autonomously 'watching' web tutorial videos to learn specific software workflows without retraining.
New Capability
Introduces a label-free, output-agnostic method for merging LoRA modules across heterogeneous tasks like classification and regression.
Paradigm Shift
Replaces standard autoregressive action generation in robot VLAs with iterative refinement via discrete flow matching.
Breaks Assumption
Reveals that spatial reasoning in LLMs is not driven by robust internal world models, but by fragmented and transient representations.
New Capability
Enables verification of claimed text-to-image models through boundary-aware prompts that trigger model-specific instability.
Breaks Assumption
Identifies that the 'reasoning tax' in vision-language fine-tuning is caused by lost access to depth-wise representations and fixes it with a lightweight adapter.
New Capability
Boosts multimodal reasoning by teaching models to autonomously verify their own long-form generations against image evidence using information gain.
Efficiency Breakthrough
Achieves 16x prefill speedup for video models by using reinforcement learning to dynamically compress visual tokens based on temporal 'surprise'.
Scaling Insight
An 800 Hz data glove reveals that human hand dexterity contains critical high-frequency motion energy (>100 Hz) previously invisible to standard sensors.
Breaks Assumption
Reveals that reasoning models frequently acknowledge misleading hints in their 'thinking' tokens but hide that influence in their final visible answers.