AI & Machine Learning

2,557 papers · Page 21 of 52

Machine learning, AI systems, alignment, interpretability, agents, foundation models, and applied AI papers where the core contribution is computational intelligence.

Filter by category: Paradigm Challenge Breaks Assumption First Ever Nature Is Weird Practical Magic Cosmic Scale Life Origin Open Release Efficiency Leap New Capability Scaling Insight

Paradigm Challenge

If you change just one tiny ingredient in an AI’s training, you can break the whole thing without a single warning light going off.

Practical Magic

Forget weighing yourself every morning—recording a quick voice memo could be way better at spotting a heart failure flare-up before it happens.

Practical Magic

Imagine headphones that let you 'mute' a crying baby or a leaf blower while keeping the rest of the world sounding perfectly clear.

Paradigm Challenge

If you mash two 'safe' AI models together, you can accidentally create a dangerous one—turns out you can hide a trap by splitting it across separate files.

Nature Is Weird

A top AI coding tool leaked its own secret source code because the developers got lazy and just trusted the code the AI wrote for its own setup.

Paradigm Challenge

We found a way to send data faster than the 'speed limit' of physics that everyone thought was impossible to break.

Paradigm Challenge

The math formula the World Bank has used for 40 years to measure global poverty has been proven to be logically impossible.

Practical Magic

We found a way to run stats in 'superposition,' so a computer can check every possible version of a dataset at the same time.

Efficiency Breakthrough

Recovers short-text performance in context-extended LLMs using 60x less data than current state-of-the-art distillation methods.

First foundation model to unify text, image, audio, and video using native masked diffusion instead of autoregressive serialization.

Breaks Assumption

Discovers that post-training reasoning models mask rather than delete safety mechanisms, allowing their restoration with lightweight adapters.

Efficiency Breakthrough

Introduces entropy-guided adaptive decoding that gives small models reasoning performance comparable to frontier models at a fraction of the cost.

Breaks Assumption

Proves that 'inverse scaling' on many benchmarks is a prompt-dependent artifact caused by verbosity, which can be reversed by forcing brevity.

Enables reinforcement learning for long-horizon robots across diverse tasks without requiring manual reward engineering.

Efficiency Breakthrough

Proposes a 'no-backprop' stochastic process memory for edge agents that solves the retention-forgetting tradeoff with fixed compute.

Breaks Assumption

Mathematically and empirically proves that classifier-based safety gates are fundamentally incapable of monitoring self-improving AI.

First generative model capable of synthesizing physically consistent 'raw' camera sensor data from text prompts or sRGB images.

A production-ready adaptive router for LLM portfolios that manages cost-quality trade-offs in real-time under strict dollar budgets.

Breaks Assumption

Masked Image Modeling (MIM) representations are fundamentally polluted with non-semantic noise, which can be fixed with a zero-cost post-hoc linear projection.

Breaks Assumption

Standard alignment metrics like CKA and RSA systematically fail when comparing networks in superposition, often leading to false conclusions about model similarity.

Scaling Insight

Neural collapse is triggered by a predictable 'feature-norm threshold' (fn*) that is invariant to training conditions, serving as a new diagnostic for training progress.

Efficiency Breakthrough

MAC-Attention achieves 14x attention-phase speedups and reduces KV cache accesses by 99% for long-context LLMs by reusing computation from semantically similar queries.

Efficiency Breakthrough

A modified 110M parameter ColBERT model can identify fine-grained evidence spans as accurately as a 27B parameter LLM, but at a fraction of the cost.

LLM-guided program evolution has discovered a new data-shuffling rule for SGD that provably and empirically outperforms standard Random Reshuffling.

Breaks Assumption

Self-reflective prompting (self-correction) fails to improve accuracy in safety-critical medical QA, frequently introducing new errors rather than fixing old ones.

Breaks Assumption

The 'modality gap' in Vision-Language Models is composed of two distinct geometric components, and the commonly used 'raw gap' is a misleading metric for cross-modal quality.

High-quality oversight of massive proprietary LLM agents can be achieved by small, open-source 'critics' that intervene in real-time within the same interaction.

Reduces multimodal jailbreak success rates by 97% using a simple conditional decoding strategy without task-specific fine-tuning.

A comprehensive analysis of AI safety vulnerabilities including automated circuit discovery, latent adversarial training, and power-law scaling of jailbreak success.

Efficiency Breakthrough

A lightweight framework for triaging agentic trajectories post-deployment without the cost of human review or auxiliary LLM calls.

Independently reproduces OpenAI's gpt-oss-20b scores by reverse-engineering undisclosed tool-calling formats and agent harnesses.

Reconstructs authentic LiDAR point clouds under jamming attacks with a 92% success rate by exploiting raw full-waveform representations.

Identifies a fundamental quality-exploration dilemma in Diffusion Language Models where remasking improves single-sample quality but kills reasoning diversity.

Scaling Insight

Gradient-based data valuation (TracIn) outperforms all human-crafted metadata heuristics for ordering curriculum learning in motion planners.

Introduces training-free and model-free trajectory planning by computing diffusion score functions directly from data libraries via kernel-weighted estimation.

Breaks Assumption

Foundational deep networks consistently assign higher density to simpler images, regardless of training data or architecture complexity.

Efficiency Breakthrough

A cross-graph tuning-free prompting framework for GNNs that achieves massive gains on unseen graphs without retraining.

Proposes a decision-centric architecture that separates signal estimation from control policy to make LLM system decisions explicit and inspectable.

Enables zero-shot humanoid navigation in unseen environments using only 5 hours of human walking data and no robot-specific data.

A white-box membership inference attack using 'gradient-induced feature drift' to outperform all existing confidence-based methods.

Efficiency Breakthrough

Self-Routing removes the need for learned routers in Mixture-of-Experts (MoE) by using hidden states directly for expert assignment.

Efficiency Breakthrough

Improves Qwen2.5-7B performance on AIME2024 by 137% through test-time iterative rethinking and majority-voted pseudo-labels.

Efficiency Breakthrough

Automates mathematical optimization modeling using reinforcement learning with solver-derived rewards instead of human process supervision.

Breaks Assumption

Reveals that many 'polysemantic' neurons in LLMs are actually firing for shared word forms (lexical) rather than compressed semantic concepts.

Truth Anchoring (TAC) provides a post-hoc calibration method to align LLM uncertainty metrics with actual factual correctness.

Scaling Insight

Demonstrates that LLM judge panels follow power-law discovery curves, where panel size and persona diversity are critical for uncovering edge-case failures.

Identifies 'diversity collapse' in the popular GRPO reinforcement learning method and introduces MUPO to maintain broad reasoning paths.

Introduces the first auto-regressive framework for Gaussian Splatting, enabling parallel, progressive next-scale 3D generation.

Efficiency Breakthrough

Optimizes LLM inference scheduling by treating output length as a heavy-tailed distribution rather than a point estimate.

Efficiency Breakthrough

Introduces negative early exit and adaptive boosting to make Monte Carlo Tree Search (MCTS) practical for real-time LLM inference.