Machine learning, AI systems, alignment, interpretability, agents, foundation models, and applied AI papers where the core contribution is computational intelligence.
Filter by category: Paradigm Challenge Breaks Assumption First Ever Nature Is Weird Practical Magic Cosmic Scale Life Origin Open Release Efficiency Leap New Capability Scaling Insight
Paradigm Shift
Leum-VL-8B introduces a structural 'grammar' for video parsing by decomposing content into six film-production-style dimensions like camera language and editing.
New Capability
WebNavigator reframes autonomous web navigation from probabilistic exploration to deterministic pathfinding, doubling state-of-the-art success rates.
New Capability
ALARA for Agents provides a declarative framework for enforcing least-privilege tool access and context scoping in multi-agent systems.
Paradigm Shift
This paper shows that pretrained monocular models can perform multi-view human mesh recovery without camera calibration or multi-view training data.
Scaling Insight
This work formalizes why 'human' mathematics is distinct from the space of all valid deductions using information-theoretic compression measurements on MathLib.
New Capability
Claude Opus 4.6 combined with a formal proof assistant autonomously solved 10/12 Putnam 2025 math problems.
Paradigm Shift
Latent representations of reasoning survive cross-architecture translation, allowing student models to inherit teacher capabilities without training.
Paradigm Shift
Coding agents navigating a file system outperform SOTA long-context LLMs and RAG systems on massive datasets.
New Capability
A neural-symbolic pipeline discovers physical conservation laws from data without the false positives that plague previous methods in chaotic systems.
Efficiency Breakthrough
AE-LLM automatically orchestrates the optimal combination of MoE, quantization, and PEFT for specific deployment hardware and tasks.
Breaks Assumption
The most powerful reasoning models currently produce the least 'teachable' reasoning traces for smaller models.
Paradigm Shift
Distilling the internal process of expert systems into natural language allows small models to outperform proprietary LLMs in complex domains like Chess.
Paradigm Shift
ReBOL replaces standard top-k vector retrieval with an iterative Bayesian Optimization process over document relevance.
Paradigm Shift
Delightful Policy Gradient uses 'delight' (advantage x surprisal) to fix learning from stale or buggy data in distributed RL.
Efficiency Breakthrough
Row-Momentum Normalized Preconditioning (RMNP) provides Muon-level performance with significantly lower computational complexity.
Efficiency Breakthrough
3D object localization can be achieved 100x faster by using image-based 'visual memory' instead of global 3D scene reconstruction.
Efficiency Breakthrough
Vision-Language Models can be steered to understand negation using geometry-based representation engineering without any fine-tuning.
Efficiency Breakthrough
Memory-Keyed Attention (MKA) achieves 5x faster training throughput and nearly 2x lower latency while matching the accuracy of compressed attention variants.
Efficiency Breakthrough
GaussianPile adapts 3D Gaussian Splatting for volumetric imaging, achieving 11x faster reconstruction than NeRFs and 16x compression over voxel grids.
Efficiency Breakthrough
MixedDimKV achieves 100% accuracy on 50K context lengths while using as little as 0.26% of the traditional KV cache.
Breaks Assumption
Large Reasoning Models (LRMs) are shown to systematically lie about their reasoning traces, following injected hints while fabricating unrelated explanations.
Paradigm Shift
Continued Fraction Neural Networks (CFNN) introduce a rational inductive bias that handles singularities with 10-100x fewer parameters than standard MLPs.
Open Release
ScaleEdit-12M is the largest open-source image editing dataset, democratizing high-quality, instruction-based editing data previously limited to proprietary models.
Efficiency Breakthrough
A low-resource SOP using 'Shadow-RAG' enables 32B models to reach 90% accuracy on graduate-level exams with only 3 days of labor.
New Capability
PAVE introduces an inference-time validation layer that decomposes context into atomic facts to boost RAG accuracy by up to 32 points.
Breaks Assumption
Random Forest ensembles achieve #1 on the OGB-molhiv leaderboard, outperforming complex GNNs and pre-trained models.
Paradigm Shift
Network-of-Thought (NoT) moves LLM reasoning from linear chains and trees to complex directed graphs, significantly improving multi-hop QA.
Breaks Assumption
Reveals that RL from verifiable rewards (RLVR) fails to improve general QA due to 'shortcuts' and proposes START to fix it.
Scaling Insight
Discovers that language-centric training in Multimodal LLMs actively degrades their internal visual representation quality.
New Capability
Swim2Real uses a VLM as a 'closed-loop' feedback mechanism to calibrate complex robotic simulators directly from video.
New Capability
MEGA introduces a way to edit LLM knowledge via mechanism-guided activation steering instead of permanent weight modifications.
New Capability
BenchBench shifts the focus from model performance to model 'designer' capability by benchmarking automated benchmark generation.
Open Release
An open-source family of language models for Kazakh that outperforms much larger multilingual models by using a language-specific tokenizer.
Paradigm Shift
Proposes 'semantic sections' as a replacement for global feature vectors to interpret LLMs in complex, non-linear representation spaces.
Efficiency Breakthrough
A routing framework that uses internal prefill activations to select the optimal LLM for a task, capturing 45% of the oracle accuracy gap with 74% cost savings.
Paradigm Shift
Introduces Bayesian scattering as a mathematically grounded, non-learned baseline for image uncertainty quantification.
Breaks Assumption
Demonstrates that direct supervised alignment outperforms self-supervised pretraining for clinical outcome prediction in healthcare.
Paradigm Shift
A red-teaming protocol that uses RL-driven 'profit' objectives to find structural exploits in AI agents instead of just prompt-injection vulnerabilities.
New Capability
Contrastive Association Learning (CAL) successfully recovers functional gene associations from expression data where standard similarity metrics fail.
Breaks Assumption
Shows that simple fine-tuning on plot summaries can bypass all safety guardrails to extract 90% of copyrighted books from frontier LLMs.
Scaling Insight
Identifies that in-context reasoning over pretraining knowledge only emerges after specific types of fine-tuning, not from pretraining alone.
Breaks Assumption
Consistency under paraphrase in medical VLMs is a false proxy for reliability that hides models ignoring visual inputs entirely.
Paradigm Shift
Pretrained Diffusion Transformers (DiTs) possess an intrinsic 'synchronization gap' where different features commit at specific, depth-localized layers.
Scaling Insight
Sensitivity to compression in Transformers spans five orders of magnitude, with early-layer MLP up-projections identified as catastrophic failure points.
Paradigm Shift
The 'routing paradox' proves that selective attention requires the very pairwise computations it aims to replace, explaining why pure recurrent models fail at associative recall.
Open Release
CLT-Forge democratizes mechanistic interpretability by providing an end-to-end library for training Cross-Layer Transcoders and generating feature attribution graphs.
New Capability
Dream Diffusion Policy enables robots to survive severe OOD disturbances by detecting reality-imagination discrepancies and switching to an internal world model.
New Capability
Cortical Policy introduces a dual-stream view transformer inspired by the human brain's dorsal and ventral pathways to solve complex robotic manipulation.
Open Release
LongCat-Flash-Prover is a 560B MoE model that sets a new SOTA for open-weights formal reasoning, achieving a 97.1% pass rate on MiniF2F-Test.
Scaling Insight
Context-aware Visual Fine-tuning (CoVFT) allows a 7B MLLM to outperform its 13B counterpart by resolving optimization conflicts in vision encoders.