Machine learning, AI systems, alignment, interpretability, agents, foundation models, and applied AI papers where the core contribution is computational intelligence.
Filter by category: Paradigm Challenge Breaks Assumption First Ever Nature Is Weird Practical Magic Cosmic Scale Life Origin Open Release Efficiency Leap New Capability Scaling Insight
Practical Magic
Imagine a cell tower on wheels that literally follows you around with a camera just to make sure your bars never drop.
Nature Is Weird
After 90 years of scratching their heads, mathematicians finally proved that 'Quantum Logic' isn't just a mess—it actually works.
Paradigm Challenge
Perfectly syncing clocks across the world is actually impossible because of physics, so things like Leap Seconds are basically just a polite lie.
Breaks Assumption
Large Language Models can perfectly reconstruct training data they are strictly aligned to never express in standard generation.
Efficiency Breakthrough
MineDraft achieves a 75% throughput increase in speculative decoding by overlapping the drafting and verification stages.
Paradigm Shift
A geometric fix for Rotary Positional Embeddings (RoPE) allows Transformers to generalize to long inputs out-of-the-box by preserving 'sink token' functionality.
New Capability
Engineered modularity via per-layer supervision solves the 'Hydra effect,' allowing for the surgical control of specific model behaviors.
Breaks Assumption
Naive multi-agent routing based on self-reported quality scores results in a 'provenance paradox' that performs worse than random selection.
New Capability
NANOZK enables verifiable LLM inference with 70x smaller proofs and 24ms verification time using a novel layerwise decomposition.
Scaling Insight
Extreme neural network sparsification causes a catastrophic interpretability collapse even when global accuracy remains stable.
Paradigm Shift
A synthesizable RTL implementation of Predictive Coding allows for fully distributed, non-backprop learning directly in hardware.
Paradigm Shift
Dynamic constraints using an 'online refiner' resolve the conflict between stability and performance in Reinforcement Learning Fine-Tuning (RFT).
Efficiency Breakthrough
Q-Drift corrects quantization-induced noise in diffusion models using a plug-and-play sampler adjustment that requires only 5 calibration runs.
Efficiency Breakthrough
Achieves depth-independent training memory bounded to approximately twice the inference footprint.
New Capability
Solves the problem of 'co-firing' conflicts in probabilistic ML routing systems using temperature-scaled softmax partitioning.
Efficiency Breakthrough
A decoder-free world model that trains 1.59x faster than DreamerV3 while outperforming it on tasks with small, task-relevant objects.
Paradigm Shift
Uses Pearl's do-operator to automatically discover and mask irrelevant state dimensions in Reinforcement Learning.
Efficiency Breakthrough
Fixes the 'squeezing effect' in Direct Preference Optimization (DPO) using an efficient logit-space Sharpness-Aware Minimization.
Breaks Assumption
Demonstrates that safety alignment is a routing mechanism, not a knowledge filter, rendering current refusal-based benchmarks ineffective.
Paradigm Shift
Fine-tunes Vision-Language Models using raw images alone by using a text-to-image model as a cycle-consistency reward.
Efficiency Breakthrough
PreSCAN predicts NeRF reconstruction quality in under 30 seconds, achieving a 1000x speedup over Neural Architecture Search.
Scaling Insight
This paper provides theoretical proof that autocurriculum—where a model selects its own training problems—requires exponentially fewer reasoning demonstrations.
Breaks Assumption
FaithSteer-BENCH reveals that inference-time steering often creates 'illusory' control that collapses under minor prompt perturbations.
New Capability
MemArchitect introduces a governance layer that decouples memory lifecycle management from LLM weights to prevent 'zombie memories.'
Breaks Assumption
A systematic study finds that mechanistic interpretability methods fail to correct model errors even when internal representations are 98% accurate.
Paradigm Shift
PowerFlow uses GFlowNets to replace heuristic rewards in unsupervised fine-tuning, allowing practitioners to explicitly tune models for either logic or creativity.
Breaks Assumption
This study identifies 'Visual Sycophancy' in VLMs, where models detect visual truths internally but hallucinate incorrect answers to satisfy user expectations.
New Capability
LLM agents can now autonomously re-identify anonymous individuals by combining sparse, non-identifying cues with public data.
New Capability
VISTA decouples hypothesis generation from prompt rewriting to escape the local optima and black-box nature of current automatic prompt optimizers.
Efficiency Breakthrough
TopoChunker maps documents to a Structured Intermediate Representation (SIR) to preserve hierarchical context during RAG chunking.
New Capability
TARo introduces a learnable token-level router that steers frozen LLMs toward structured reasoning at test-time without retraining.
Efficiency Breakthrough
AFBS-BO automates the discovery of layer-specific sparse attention hyperparameters, making long-context acceleration 'plug-and-play.'
Scaling Insight
The 'Progressive Intensity Hypothesis' establishes that weaker perturbations (pruning) should precede stronger ones (quantization) for optimal joint model compression.
Paradigm Shift
AS2 achieves a fully differentiable neuro-symbolic bridge by replacing discrete solvers with a soft, continuous approximation of the Answer Set Programming operator.
Efficiency Breakthrough
Discounted Beta-Bernoulli (DBB) reward estimation solves the variance collapse and sample inefficiency inherent in point-estimation RLVR methods for LLM reasoning.
New Capability
AcceRL introduces a fully asynchronous, decoupled RL framework for Vision-Language-Action (VLA) models that integrates a plug-and-play world model.
Breaks Assumption
Multimodal LLMs suffer from a 'cognitive mismatch' where they succeed at complex reasoning while failing at basic discrete symbol recognition.
Paradigm Shift
Standard decoding strategies (top-k, nucleus) create a 'truncation blind spot' by systematically excluding human-like, low-probability token choices.
Efficiency Breakthrough
EntropyCache achieves up to 26x speedup for Diffusion Language Models by using decoded token entropy as a proxy for KV cache staleness.
Efficiency Breakthrough
AIMER provides a calibration-free criterion for expert pruning in MoE models that matches state-of-the-art performance in seconds.
Scaling Insight
Mechanistic analysis of 'counting circuits' in VLMs allows for lightweight interventions that improve general visual reasoning performance.
New Capability
Generative 3D world models are used to scale Sim-to-Real reinforcement learning for robot Vision-Language-Action (VLA) models.
Efficiency Breakthrough
DDPO addresses the 'overthinking' and 'overconfidence' issues in Large Reasoning Models (LRMs) by optimizing answer length based on task difficulty.
Scaling Insight
Synthetic data scaling reaches a new level by moving from simple rephrasing to creating 'megadocs' through rationale insertion and stitching.
Paradigm Shift
SINDy-KANs combine Kolmogorov-Arnold Networks with Sparse Identification of Non-linear Dynamics to create parsimonious, interpretable models.
Open Release
SpecForge provides an open-source framework and high-quality draft models (SpecBundle) to make speculative decoding production-ready.
Breaks Assumption
The legally mandated right to be forgotten (unlearning) can be weaponized as an adversarial attack surface to collapse model accuracy.
New Capability
Learning to Self-Evolve (LSE) trains LLMs to explicitly improve their own context at test-time via reinforcement learning.
Open Release
OpenT2M is a massive open-source motion dataset (2,800+ hours) that addresses the data starvation in text-to-motion generation.
Paradigm Shift
REST transforms the zero-shot object-navigation problem from simple waypoint selection to a tree-of-paths reasoning process.