AI & ML

1625 papers · Page 6 of 17

Reveals that synthetic rewriting is a quality multiplier for high-grade data, but fails to fix low-quality source data.

Scaling Insight arxiv | Mar 27

Proves that stereo matching can reach state-of-the-art performance without the computationally heavy cost volumes used by almost all modern methods.

Breaks Assumption arxiv | Mar 27

Speeds up RL-based reasoning training by 1.7x using an online quality head to prune failing rollouts mid-generation.

Efficiency Breakthrough arxiv | Mar 27

Introduces a multi-answer RL objective that trains models to represent a distribution of valid answers in a single forward pass.

Paradigm Shift arxiv | Mar 27

Proves platform-determinism is necessary for trustworthy AI and implements an integer-only engine for bitwise identical inference across ARM and x86.

Breaks Assumption arxiv | Mar 27

Quantifies near-verbatim data extraction risk in LLMs at 1/5000th the computational cost of standard Monte Carlo methods.

New Capability arxiv | Mar 27

Enables graph-based retrieval and reranking for RAG without the maintenance overhead of a knowledge graph.

New Capability arxiv | Mar 27

Reduces visual tokens in robot policies by 78% by using inter-layer rank consistency instead of simple attention magnitude.

Breaks Assumption arxiv | Mar 27

This paper demonstrates that the order of training examples alone can encode information not present in any individual example, allowing models to bypass established sample complexity bounds.

Breaks Assumption arxiv | Mar 27

A systematic study reveals that grokking is not an architectural property of Transformers but an interaction between weight decay and optimization stability.

Scaling Insight arxiv | Mar 27

The 'Reasoning Contamination Effect' shows that Chain-of-Thought (CoT) reasoning actually disrupts a model's internal confidence signal, leading to poorer calibration.

Paradigm Shift arxiv | Mar 27

Large Language Models process instructions as social acts rather than technical specifications, making 'imperative mood' prompts behave inconsistently across different languages.

Breaks Assumption arxiv | Mar 27

GeoNDC introduces a queryable neural data cube that compresses 20 years of planetary satellite data by 95x while allowing on-demand continuous-time reconstruction.

New Capability arxiv | Mar 27

Sparton is a specialized Triton kernel that solves the massive memory bottleneck of Learned Sparse Retrieval (LSR) models like Splade.

Efficiency Breakthrough arxiv | Mar 27

Intern-S1-Pro is the first trillion-parameter scientific multimodal foundation model, outperforming proprietary models on specialized scientific reasoning.

New Capability arxiv | Mar 27

AirVLA successfully transfers manipulation-trained Vision-Language-Action (VLA) models to underactuated aerial robots using a payload-aware guidance mechanism.

New Capability arxiv | Mar 27

R1Sim applies the 'Reasoning-RL' paradigm (popularized by DeepSeek-R1) to traffic simulation, achieving superior safety and diversity in multi-agent behaviors.

Paradigm Shift arxiv | Mar 27

SIGMA resolves 'trajectory divergence' in molecular string generation by enforcing geometric symmetry recognition through contrastive learning.

Paradigm Shift arxiv | Mar 27

A fully differentiable agent-based traffic simulator enables calibration and control of million-vehicle networks 173x faster than real-time.

Efficiency Breakthrough arxiv | Mar 27

GIFT is a training-free frame selection framework that uses 'Directed Diversity' to boost Video-LLM performance by up to 12.5%.

Efficiency Breakthrough arxiv | Mar 27

Z-Erase introduces the first concept erasure method for single-stream diffusion transformers, preventing generation collapse in new unified architectures.

New Capability arxiv | Mar 27

This paper demonstrates that Sparse Autoencoder (SAE) features in multimodal models are not modular, challenging the core assumption of intervention-based steering.

Breaks Assumption arxiv | Mar 27

Pixelis shifts VLM reasoning from static description to a 'reasoning in pixels' agentic paradigm that learns via an executable tool grammar.

Paradigm Shift arxiv | Mar 27

The AE4E paradigm proposes a 'Social Contract' for multi-agent economies, replacing individual model alignment with an institutional 'Separation of Power'.

Paradigm Shift arxiv | Mar 27

MSRL scales multimodal reward modeling by transferring reasoning capabilities from text to vision-language tasks without requiring new multimodal preference data.

Scaling Insight arxiv | Mar 27

SEVerA enables the synthesis of self-evolving agents with formal guarantees by combining LLM planning with first-order logic rejection samplers.

New Capability arxiv | Mar 27

Using Signal Detection Theory, this work proves that LLM calibration and 'metacognitive efficiency' (knowing what you know) are distinct, dissociable capacities.

Paradigm Shift arxiv | Mar 27

Photon enables efficient 3D medical volume understanding through adaptive token scheduling and a novel 'gradient restoration' backpropagation rule.

Efficiency Breakthrough arxiv | Mar 27

Vision Hopfield Memory Networks (V-HMN) present a brain-inspired alternative to Transformers and Mamba using hierarchical associative memory mechanisms.

Paradigm Shift arxiv | Mar 27

Trace2Skill distills lessons from across a 'parallel fleet' of execution trajectories into a unified, conflict-free skill directory for LLM agents.

New Capability arxiv | Mar 27

Pruning low-utility prompts before RL rollouts allows for 10x more efficient training of large reasoning models.

Efficiency Breakthrough arxiv | Mar 27

Safety alignment does not have to be a 'tax' on performance; it can actually improve mathematical reasoning accuracy.

Breaks Assumption arxiv | Mar 27

Enable long video generation from short-video diffusion models without any additional training or fine-tuning.

New Capability arxiv | Mar 27

Training-free 6D pose estimation for unseen surgical instruments using only a CAD model as prior knowledge.

New Capability arxiv | Mar 27

Offline Decision Transformers can now synthesize strategies that surpass the classical heuristics they were trained on for the Traveling Salesman Problem.

New Capability arxiv | Mar 27

Simple image sharpening serves as a surrogate-free, zero-cost preemptive defense against adversarial attacks.

Efficiency Breakthrough arxiv | Mar 27

Representing GPS trajectories as hyperspectral images enables multi-month dense anomaly detection that was previously computationally intractable.

Paradigm Shift arxiv | Mar 27

A foundation model for gait transforms 3D skeletal motion into a systemic biosignal for multi-system health monitoring.

New Capability arxiv | Mar 27

A new tokenization architecture reduces the 'Token Tax' for complex non-Latin scripts by over 60%.

Efficiency Breakthrough arxiv | Mar 27

Sparse Autoencoder analysis reveals that weight pruning counter-intuitively preserves rare features better than frequent ones.

Breaks Assumption arxiv | Mar 27

LLMs can be fine-tuned to act as their own 'Z-token' compressors, achieving 18x text reduction without losing reconstruction fidelity.

New Capability arxiv | Mar 27

GlowQ introduces group-shared low-rank approximations to speed up quantized LLM inference by up to 37%.

Efficiency Breakthrough arxiv | Mar 27

Defines 'Reasoning Safety' as a new security dimension and introduces a real-time monitor to detect logic-chain hijackings.

New Capability arxiv | Mar 27

Cross-model disagreement (CMP/CME) provides a highly effective, label-free signal for detecting confident hallucinations.

Breaks Assumption arxiv | Mar 27

Introduces a training-free pipeline for pixel-level video anomaly detection that achieves a 5x improvement in object-level accuracy.

New Capability arxiv | Mar 27

A model-agnostic framework to extract the model-implied causal structure from any trained temporal predictor.

New Capability arxiv | Mar 27

Reduces LLM inference energy by 40% (and up to 81%) using a distillation-based router to skip unnecessary reasoning steps.

Efficiency Breakthrough arxiv | Mar 27

Detects when object detectors fail to see safety-critical objects by measuring semantic misalignment with foundation model embeddings.

New Capability arxiv | Mar 27

Challenges the 'Golden Data' requirement for video generation by showing that imbalanced data can outperform high-quality data through timestep-aware training.

Breaks Assumption arxiv | Mar 27

Unlocks full-body musculoskeletal humanoid training by achieving order-of-magnitude speedups via massively parallel GPU simulation.

Efficiency Breakthrough arxiv | Mar 27

Fixes the inherent instability of on-policy distillation in LLMs using local support matching and top-p rollout sampling.

Paradigm Shift arxiv | Mar 27

Achieves 45% performance gains in robotics using 5-10x fewer real-world demonstrations through high-dimensional factorization.

Efficiency Breakthrough arxiv | Mar 27

Enables LMMs to 'think' using compact latent visual representations rather than verbalizing everything into text.

Paradigm Shift arxiv | Mar 27

Translates a single natural language sentence into a validated, hardware-specific computational imaging system design.

New Capability arxiv | Mar 27

Achieves up to 4.7x speedup for Diffusion LLMs using a training-free self-speculative decoding framework.

Efficiency Breakthrough arxiv | Mar 27

Generates 2-minute 480p videos on a single H200 GPU through a hierarchical KV-cache strategy that compresses context by 32x.

Efficiency Breakthrough arxiv | Mar 27

Introduces the concept of a 'trainable' knowledge base for RAG that improves performance by distilling and writing back compact knowledge units.

Paradigm Shift arxiv | Mar 27

Enables 4K novel view synthesis in a feed-forward manner by decoupling geometric complexity from rendering resolution.

Efficiency Breakthrough arxiv | Mar 27

A training-free decoding framework that mitigates multimodal hallucinations by re-ranking tokens based on spatial attention entropy.

New Capability arxiv | Mar 27

Demonstrates that general-purpose coding agents can achieve 20x speedups in hardware design optimization without domain-specific training.

Efficiency Breakthrough arxiv | Mar 27

Introduces a 'Hybrid Memory' architecture that maintains the identity and motion of dynamic subjects even when they hide out of view.

New Capability arxiv | Mar 27

Achieves state-of-the-art compositionality in vision-language models without the need for hard negative mining or degrading zero-shot performance.

Breaks Assumption arxiv | Mar 27

Uses cycle-consistency as a label-free reward signal for reinforcement learning to resolve contradictions in multimodal reasoning.

Paradigm Shift arxiv | Mar 27

A training-free enhancement that unlocks multi-scale synergies in Vision Foundation Models (VFMs) to boost performance across various tasks.

Efficiency Breakthrough arxiv | Mar 27

Researchers are making satellites into high-security vaults in space that are literally impossible to hack from down here on Earth.

Practical Magic arxiv | Mar 26

For 30 years, we didn't know the absolute limit of how much a machine can learn. Someone just finally cracked the code.

Paradigm Challenge arxiv | Mar 26

Forget metal antennas—scientists just built a 'quantum radio' using a cloud of atoms that works way better.

Practical Magic arxiv | Mar 26

Engineers figured out how to make radio waves literally swerve around people trying to eavesdrop on your signal.

Practical Magic arxiv | Mar 26

Weirdly enough, AI trained on 'fake' data is actually better at predicting real pandemics than AI trained on actual history.

Paradigm Challenge arxiv | Mar 26

Frontier models like GPT-5.2 and Claude 4.5 suffer from 'Internal Safety Collapse' where safety alignment fails completely if a task's success necessitates harmful output.

Breaks Assumption arxiv | Mar 26

Berta is an open-source, production-proven AI clinical scribe that reduces operating costs by up to 95% compared to commercial alternatives.

Open Release arxiv | Mar 26

Memory Sparse Attention (MSA) enables LLMs to scale to 100 million tokens with linear complexity and less than 9% precision degradation.

Efficiency Breakthrough arxiv | Mar 26

Prompt compression can paradoxically increase total energy consumption and cost by over 2000% due to aggressive model 'output expansion'.

Breaks Assumption arxiv | Mar 26

Synthetic Mixed Training allows an 8B model to finally outperform RAG on long-document comprehension by combining synthetic QAs with rewritten documents.

Scaling Insight arxiv | Mar 26

Logical reasoning in LLMs is causally linked to 'algebraic divergence' in the residual stream, and failure to achieve this geometry explains sycophancy.

Paradigm Shift arxiv | Mar 26

Environment Maps nearly double the success rate of long-horizon agents by replacing session-bound context with a persistent, structured graph representation.

Paradigm Shift arxiv | Mar 26

A statistical physics framework that predicts the fundamental limits of agentic self-improvement and nested LLM architectures.

Paradigm Shift arxiv | Mar 26

Inference-time 'steering' of Code LLMs allows for precise control over programming languages and libraries without prompting or fine-tuning.

New Capability arxiv | Mar 26

The first sorting-free stochastic formulation for 3D Gaussian Splatting that matches rasterization speed while enabling full ray-traced effects.

Efficiency Breakthrough arxiv | Mar 26

Bio-inspired visual servoing that achieves low-latency robotic control by processing event-stream flux directly, bypassing traditional state estimation.

Paradigm Shift arxiv | Mar 26

Training-free Out-of-Distribution (OOD) detection that beats state-of-the-art by aggregating features across intermediate network layers.

Breaks Assumption arxiv | Mar 26

Newer LLM architectures like MoE and SSMs are making 'early-exit' decoding significantly less effective than in previous generations.

Scaling Insight arxiv | Mar 26

AI agent benchmarks can be slashed by ~50% in cost by only evaluating on tasks with intermediate historical pass rates.

Efficiency Breakthrough arxiv | Mar 26

A universal 'one-shot' medical anomaly detector that outperforms specialized models across nine different datasets.

New Capability arxiv | Mar 26

Grokking is not the discovery of a new algorithm, but the sharpening of one already latent in the model during the memorization phase.

Breaks Assumption arxiv | Mar 26

Diffusion models can be proven to generalize by capturing manifold geometry long before they achieve density estimation or memorization.

Scaling Insight arxiv | Mar 26

Sparse Autoencoders (SAEs) can successfully decompose opaque medical vision foundation model embeddings into human-interpretable clinical concepts.

New Capability arxiv | Mar 26

A massive empirical study of 177,000 tools reveals a rapid shift in the AI agent ecosystem from 'perception' to 'action' (27% to 65% usage).

Paradigm Shift arxiv | Mar 26

A simple perturbation method reveals that representations are not just activation patterns, but conduits that determine how learning 'infects' similar examples.

Paradigm Shift arxiv | Mar 26

LLMs can solve planning problems with state spaces as large as 10^165 by acting as program generators rather than direct planners.

Paradigm Shift arxiv | Mar 26

Symbolic-KANs bridge the gap between scalable deep learning and interpretable symbolic regression by embedding discrete library primitives directly into the network.

New Capability arxiv | Mar 26

Transformer hallucinations in high-stakes legal tasks are deterministic failures driven by calculable internal state thresholds rather than random 'glitches'.

Breaks Assumption arxiv | Mar 26

An 'invariant compiler' uses LLMs to translate physics requirements into Neural ODE architectures that satisfy conservation laws by construction.

New Capability arxiv | Mar 26

Hybrid Distillation Policy Optimization (HDPO) overcomes the 'vanishing gradient' problem for hard mathematical prompts that RL agents cannot solve.

Efficiency Breakthrough arxiv | Mar 26

BioVITA releases a massive multimodal biological dataset of 3.6M image-audio-text samples covering 14,000 species.

Open Release arxiv | Mar 26

A self-distillation method for Multi-Token Prediction (MTP) that yields a 220% inference speedup with minimal training cost.

Efficiency Breakthrough arxiv | Mar 26

AttentionPack achieves up to 8x memory efficiency during decoding for large vision-language models (VLMs).

Efficiency Breakthrough arxiv | Mar 26

POISE demonstrates the first autonomous, evidence-driven discovery of improved policy optimization algorithms for LLMs.

New Capability arxiv | Mar 26

Listed API prices for reasoning models (RLMs) are shown to be highly misleading, with cheaper models often costing 28x more in practice.

Breaks Assumption arxiv | Mar 26

SLAT-Phys predicts spatially varying material property fields directly from single RGB images with a 120x speedup.

Efficiency Breakthrough arxiv | Mar 26