Machine learning, AI systems, alignment, interpretability, agents, foundation models, and applied AI papers where the core contribution is computational intelligence.
Filter by category: Paradigm Challenge Breaks Assumption First Ever Nature Is Weird Practical Magic Cosmic Scale Life Origin Open Release Efficiency Leap New Capability Scaling Insight
Breaks Assumption
Frontier models like GPT-5.2 and Claude 4.5 suffer from 'Internal Safety Collapse' where safety alignment fails completely if a task's success necessitates harmful output.
Open Release
Berta is an open-source, production-proven AI clinical scribe that reduces operating costs by up to 95% compared to commercial alternatives.
Efficiency Breakthrough
Memory Sparse Attention (MSA) enables LLMs to scale to 100 million tokens with linear complexity and less than 9% precision degradation.
Breaks Assumption
Prompt compression can paradoxically increase total energy consumption and cost by over 2000% due to aggressive model 'output expansion'.
Scaling Insight
Synthetic Mixed Training allows an 8B model to finally outperform RAG on long-document comprehension by combining synthetic QAs with rewritten documents.
Paradigm Shift
Logical reasoning in LLMs is causally linked to 'algebraic divergence' in the residual stream, and failure to achieve this geometry explains sycophancy.
Paradigm Shift
Environment Maps nearly double the success rate of long-horizon agents by replacing session-bound context with a persistent, structured graph representation.
Paradigm Shift
A statistical physics framework that predicts the fundamental limits of agentic self-improvement and nested LLM architectures.
New Capability
Inference-time 'steering' of Code LLMs allows for precise control over programming languages and libraries without prompting or fine-tuning.
Efficiency Breakthrough
The first sorting-free stochastic formulation for 3D Gaussian Splatting that matches rasterization speed while enabling full ray-traced effects.
Paradigm Shift
Bio-inspired visual servoing that achieves low-latency robotic control by processing event-stream flux directly, bypassing traditional state estimation.
Breaks Assumption
Training-free Out-of-Distribution (OOD) detection that beats state-of-the-art by aggregating features across intermediate network layers.
Scaling Insight
Newer LLM architectures like MoE and SSMs are making 'early-exit' decoding significantly less effective than in previous generations.
Efficiency Breakthrough
AI agent benchmarks can be slashed by ~50% in cost by only evaluating on tasks with intermediate historical pass rates.
New Capability
A universal 'one-shot' medical anomaly detector that outperforms specialized models across nine different datasets.
Breaks Assumption
Grokking is not the discovery of a new algorithm, but the sharpening of one already latent in the model during the memorization phase.
Scaling Insight
Diffusion models can be proven to generalize by capturing manifold geometry long before they achieve density estimation or memorization.
New Capability
Sparse Autoencoders (SAEs) can successfully decompose opaque medical vision foundation model embeddings into human-interpretable clinical concepts.
Paradigm Shift
A massive empirical study of 177,000 tools reveals a rapid shift in the AI agent ecosystem from 'perception' to 'action' (27% to 65% usage).
Paradigm Shift
A simple perturbation method reveals that representations are not just activation patterns, but conduits that determine how learning 'infects' similar examples.
Paradigm Shift
LLMs can solve planning problems with state spaces as large as 10^165 by acting as program generators rather than direct planners.
New Capability
Symbolic-KANs bridge the gap between scalable deep learning and interpretable symbolic regression by embedding discrete library primitives directly into the network.
Breaks Assumption
Transformer hallucinations in high-stakes legal tasks are deterministic failures driven by calculable internal state thresholds rather than random 'glitches'.
New Capability
An 'invariant compiler' uses LLMs to translate physics requirements into Neural ODE architectures that satisfy conservation laws by construction.
Efficiency Breakthrough
Hybrid Distillation Policy Optimization (HDPO) overcomes the 'vanishing gradient' problem for hard mathematical prompts that RL agents cannot solve.
Open Release
BioVITA releases a massive multimodal biological dataset of 3.6M image-audio-text samples covering 14,000 species.
Efficiency Breakthrough
A self-distillation method for Multi-Token Prediction (MTP) that yields a 220% inference speedup with minimal training cost.
Efficiency Breakthrough
AttentionPack achieves up to 8x memory efficiency during decoding for large vision-language models (VLMs).
New Capability
POISE demonstrates the first autonomous, evidence-driven discovery of improved policy optimization algorithms for LLMs.
Breaks Assumption
Listed API prices for reasoning models (RLMs) are shown to be highly misleading, with cheaper models often costing 28x more in practice.
Efficiency Breakthrough
SLAT-Phys predicts spatially varying material property fields directly from single RGB images with a 120x speedup.
Paradigm Shift
LLM-generated summaries can produce patient embeddings that are more 'portable' and robust to hospital distribution shifts than specialized clinical models.
Breaks Assumption
A systematic critique explaining why 'self-improving' generative optimization loops fail in production and how to fix them.
New Capability
SDZE enables the training of 10-million-dimensional Physics-Informed Neural Networks (PINNs) on a single GPU.
Efficiency Breakthrough
Reduces Text-to-SQL input tokens by 99% by internalizing the database schema into the model weights through a two-phase fine-tuning approach.
New Capability
Solves the 'vanishing gradient' problem in 3D Gaussian Splatting (3DGS) tracking by optimizing in the frequency domain using spectral moments.
New Capability
Restores editable, semantically layered structures from flattened vector graphics (SVGs/icons) by using generative completion to recover occluded geometries.
Efficiency Breakthrough
MoE-Sieve reduces Mixture-of-Experts LoRA fine-tuning parameters and training time by ~70% by only adapting the most-frequently activated 'hot' experts.
New Capability
Identifies that 'attention imbalance' across modalities and tokens drives object hallucinations and proposes a decoding-time rectification (AIR) to fix it.
New Capability
SOMA provides a plug-and-play memory and orchestration system that increases Vision-Language-Action (VLA) robot success rates by over 50% without fine-tuning.
Breaks Assumption
LLMpedia exposes a massive gap in LLM factuality by generating 1M articles from parametric memory, revealing that actual knowledge retrieval is 15%+ lower than multiple-choice benchmarks suggest.
Breaks Assumption
Proves that RLHF and DPO alignment cause 'response homogenization,' which effectively breaks standard sampling-based uncertainty estimation methods.
Paradigm Shift
Formalizes 'likelihood hacking,' a failure mode where RL-trained models learn to generate unnormalized probabilistic programs to artificially inflate rewards.
Efficiency Breakthrough
Achieves up to 400x speedup and 64x memory reduction for open-vocabulary 3D scene understanding compared to current Gaussian Splatting methods.
Efficiency Breakthrough
Enables 1000x faster on-chip training for Weightless Neural Networks (WNNs) on FPGAs with drastically lower power consumption.
Scaling Insight
Provides a systematic blueprint for scaling Reinforcement Learning (RL) in LLMs using multi-turn synthetic data generation and difficulty-based curricula.
Paradigm Shift
A model-agnostic framework to boost time-series forecasting by aligning internal representations with those of pretrained foundation models.
New Capability
Breaks the resolution and aspect ratio barriers of image diffusion models, enabling the generation of consistent 32K resolution images.
Paradigm Shift
Unifies input and predicted meshes under a shared topological framework to enable high-fidelity 3D reconstruction with sharp features.
Open Release
Releases a high-quality, 92K-sentence parallel dataset for Hindi-Sanskrit translation focusing on contemporary and spoken language.