AI & ML

1625 papers · Page 16 of 17

Proposes modeling the world in the feature space of frozen geometry foundation models instead of pixels, achieving 5x faster depth forecasting.

Paradigm Shift arxiv | Mar 16

A retrosynthesis model that explicitly learns strategic bond-disconnection reasoning via reinforcement learning with a round-trip accuracy reward.

New Capability arxiv | Mar 16

Longitudinal evidence reveals that successive ChatGPT versions are converging in output diversity, suggesting potential model collapse from synthetic data saturation.

Scaling Insight arxiv | Mar 16

A new system enables humanoid robots to play competitive tennis rallies with humans by learning from imperfect, fragmented motion data.

New Capability arxiv | Mar 16

Adversarial test case evolution improves code reinforcement learning by creating harder, more discriminative verification signals that drive better model performance.

Scaling Insight arxiv | Mar 16

Modality-level disaggregation enables cost-optimal MLLM serving across heterogeneous GPUs over commodity PCIe, bypassing the need for expensive NVLink interconnects.

Efficiency Breakthrough arxiv | Mar 16

Probing of Vision-Language-Action (VLA) models reveals that the action decoder largely ignores the reasoning logic in Chain-of-Thought, relying almost exclusively on object names.

Breaks Assumption arxiv | Mar 16

SciDesignBench provides a massive simulator-grounded environment for scientific inverse design, revealing that current LLMs struggle significantly with iterative refinement.

New Capability arxiv | Mar 16

A hardware-algorithm co-design for Spiking Neural Networks achieves up to 69x energy efficiency gains using an SRAM-based Compute-in-Memory accelerator.

Efficiency Breakthrough arxiv | Mar 16

The TaoBench benchmark proves that state-of-the-art math LLMs fail on equivalent logic problems when presented outside of the standard 'MathLib' framework.

Breaks Assumption arxiv | Mar 16

A self-supervised robotic system detects novel objects by training bespoke detectors on-the-fly from human video demonstrations, bypassing language-based prompts.

New Capability arxiv | Mar 16

AIM enables post-training modulation of large models to change utility levels or focus features without any retraining or additional data.

New Capability arxiv | Mar 16

Achieves 4x visual token compression and 80% lower training cost while unifying multimodal comprehension and generation.

Efficiency Breakthrough arxiv | Mar 16

First training-free method for debiasing reward models using Sparse Autoencoder (SAE) interventions.

New Capability arxiv | Mar 16

Breaks the long-standing accuracy-robustness trade-off in VLMs by localizing adversarial robustness to shallow layers.

Breaks Assumption arxiv | Mar 16

A flow-based navigation policy that achieves zero-shot sim-to-real transfer across wheeled, quadrupedal, and humanoid platforms.

New Capability arxiv | Mar 16

A small-scale molecular reasoning model that outperforms ultra-large foundation models via structured chain-of-thought and RL.

Paradigm Shift arxiv | Mar 16

Adaptive VLM Routing reduces inference costs for Computer Use Agents by up to 78% with negligible accuracy loss.

Efficiency Breakthrough arxiv | Mar 16

Distills a 2B Vision-Language Retriever into a 70M text-only encoder for visual document retrieval with 50x lower latency.

Efficiency Breakthrough arxiv | Mar 16

Reveals that 'reasoning' gains in fine-tuned LLMs may be artifacts of task familiarity rather than improved capability.

Breaks Assumption arxiv | Mar 16

MotionAnymesh automatically transforms static 3D meshes into simulation-ready, articulated digital twins for robotics using vision-language models grounded in physical priors.

New Capability arxiv | Mar 16

ThinkStream introduces a 'Watch-Think-Speak' paradigm for video reasoning that allows models to incrementally update understanding and decide when to respond in real-time.

Paradigm Shift arxiv | Mar 16

This paper presents an exact federated unlearning protocol for foundation models that is pointwise identical to centralized retraining but uses fixed-size messages.

Breaks Assumption arxiv | Mar 16

CleanSight provides a training-free, test-time defense for backdoored vision-language models by detecting and pruning 'attention stealing' visual tokens.

Efficiency Breakthrough arxiv | Mar 16

This study proves that even with a 'perfect' noise transition matrix, statistically consistent noise-correction methods still suffer from performance collapse.

Breaks Assumption arxiv | Mar 16

Structured distillation for personalized agent memory achieves an 11x reduction in token count while preserving 96% of the retrieval quality of verbatim history.

Efficiency Breakthrough arxiv | Mar 16

Multimodal OCR (MOCR) treats charts, diagrams, and tables as code-level targets (e.g., TikZ, SVG) rather than just cropping them as pixels.

New Capability arxiv | Mar 16

A cross-dataset study reveals that modern general-purpose vision models (GP-VMs) outperform specialized medical architectures in 2D medical image segmentation.

Breaks Assumption arxiv | Mar 16

Connects DDIM reverse chains to fractal geometry, providing a mathematical explanation for why diffusion models switch from global context to local detail.

Paradigm Shift arxiv | Mar 16

Reveals that linearized attention never converges to the NTK limit in practice, explaining its unique 'influence malleability' compared to standard networks.

Breaks Assumption arxiv | Mar 16

Induces pretrained video models to perform SOTA image restoration using less than 2% of the training data required by specialized architectures.

Efficiency Breakthrough arxiv | Mar 16

Achieves 'zero-hyperparameter' circuit analysis by using a foundation model to perform in-context regression, bypassing hours of manual tuning.

Efficiency Breakthrough arxiv | Mar 16

Proposes Causal Process Reward (CPR) to fix 'cherry-picking' in MLLM reasoning by coupling answer correctness with step-level logical alignment.

Paradigm Shift arxiv | Mar 16

Introduces Bilateral Context Conditioning to DeepSeek's GRPO, allowing models to cross-reference successful and failed reasoning traces during optimization.

Efficiency Breakthrough arxiv | Mar 16

Enables RMSNorm to reuse MXFP8 block scales, reducing the reduction operation size by 32x with a 2.4x kernel speedup.

Efficiency Breakthrough arxiv | Mar 16

Finds that privacy vulnerability and utility are both concentrated in a tiny fraction of 'critical weights' based on their location rather than value.

Breaks Assumption arxiv | Mar 16

STEVO-Bench reveals that current 'video world models' fail to simulate physical processes when the camera looks away or lights go out.

Breaks Assumption arxiv | Mar 16

Optimizes diffusion models via Direct Preference Optimization (DPO) to generate human motion that is inherently executable by real humanoid robots.

New Capability arxiv | Mar 16

Reimagines 3D molecules as continuous vector fields rather than discrete graphs, decoupling structure learning from atom types.

Paradigm Shift arxiv | Mar 16

Proves the existence of a 'distributional simplicity bias' in diffusion models, where low-order statistics are learned linearly while high-order correlations require cubic sample complexity.

Scaling Insight arxiv | Mar 16

Time moving forward might just be a glitch caused by the universe being bad at copying its own homework.

Paradigm Challenge arxiv | Mar 13

We’ve finally made digital messages that are physically impossible to copy—even a perfect hacker couldn't do it because physics won't allow it.

Practical Magic arxiv | Mar 13

Scientists built an AI that treats crop-raiding elephants like chess opponents to predict exactly where they’ll strike next.

Nature Is Weird arxiv | Mar 13

The massive satellite network the government uses is accidentally blasting out people's private passwords in plain text for anyone to see.

Cosmic Scale arxiv | Mar 13

OpenSanctions Pairs releases a massive benchmark for entity matching, proving that local LLMs can now match production rule-based systems in high-stakes compliance tasks.

Open Release arxiv | Mar 13

Speculative Decoding Scaling Laws (SDSL) provides a theoretical framework to predict optimal throughput hyperparameters for LLM inference systems before pre-training.

Scaling Insight arxiv | Mar 13

This paper introduces a graph tokenization framework that allows standard Transformers like BERT to beat specialized Graph Neural Networks without any architectural changes.

Paradigm Shift arxiv | Mar 13

The first open recipe for training embodied intelligence at the 1,000-GPU scale, achieving a 40x speedup in training cycles for GR00T models.

Efficiency Breakthrough arxiv | Mar 13

Routing signatures reveal that MoE experts are highly task-specific, allowing a simple linear classifier to identify task categories with 92.5% accuracy based only on routing patterns.

Breaks Assumption arxiv | Mar 13

A new method for training axis-aligned decision trees using gradient descent and backpropagation, allowing trees to be integrated into end-to-end neural networks.

New Capability arxiv | Mar 13

REOPOLD achieves 10x better sample efficiency in reasoning distillation, enabling 7B models to match 32B teachers with significantly less training data.

Efficiency Breakthrough arxiv | Mar 13

PACED introduces a weight kernel that focuses distillation on the 'Zone of Proximal Development,' where the student's gradient signal-to-noise ratio is highest.

Efficiency Breakthrough arxiv | Mar 13

Continual Representation Learning (CoRe) moves PEFT from weight-level updates to representation-space interventions, solving catastrophic forgetting in dynamic environments.

Paradigm Shift arxiv | Mar 13

Cyber-attack capabilities of AI models scale log-linearly with inference-time compute, with no plateau in sight.

Scaling Insight arxiv | Mar 13

SoLA introduces the first reversible model editing framework that allows precise revocation of specific knowledge updates.

New Capability arxiv | Mar 13

LLM-based user simulators create an 'easy mode' for agents that fails to capture real human frustration, ambiguity, and feedback nuances.

Breaks Assumption arxiv | Mar 13

Machine unlearning in LLMs is often a 'mirage' that can be bypassed using simple multi-hop reasoning or entity aliasing.

Breaks Assumption arxiv | Mar 13

InstantHDR achieves high-quality 3D HDR reconstruction 700x faster than current optimization-based methods.

Efficiency Breakthrough arxiv | Mar 13

Theoretical analysis proves that Langevin dynamics is fundamentally non-robust to score function errors, justifying the shift to Diffusion Models.

Paradigm Shift arxiv | Mar 13

HAPO resolves the advantage collapse problem in sparse-reward RL for reasoning models using a Thompson-sampled hindsight mechanism.

Paradigm Shift arxiv | Mar 13

Adversarial prompt injection causes jailbreak success rates to transition from polynomial to exponential scaling with inference-time samples.

Scaling Insight arxiv | Mar 13

RewardHackingAgents establishes a benchmark for evaluating whether ML-engineering agents are actually solving tasks or just tampering with the evaluation code.

New Capability arxiv | Mar 13

TimeSqueeze achieves 20x faster convergence and 8x higher data efficiency for time-series foundation models by using dynamic, content-aware patching.

Efficiency Breakthrough arxiv | Mar 13

MirrorDrift demonstrates a successful SLAM-targeted attack on production-grade 'secure' LiDARs using simple actuated mirrors rather than complex signal injection.

Breaks Assumption arxiv | Mar 13

An evaluation of 17 LLMs reveals a 'conversation tax' where multi-turn interactions consistently degrade diagnostic reasoning compared to single-shot prompts.

Breaks Assumption arxiv | Mar 13

This paper introduces Finsler geometry to manifold learning, allowing for the capture of asymmetric data relationships like density hierarchies that Riemannian methods ignore.

Paradigm Shift arxiv | Mar 13

Re-evaluating high-profile medical AI safety claims reveals that reported triage failures were artifacts of the 'exam-style' evaluation format rather than model incapacity.

Breaks Assumption arxiv | Mar 13

DART enables real-time multi-class detection for open-vocabulary models like SAM3, achieving up to 25x speedup without any weight modifications.

Efficiency Breakthrough arxiv | Mar 13

Softmax normalization mathematically mandates the creation of attention sinks to serve as 'null states' when models need to ignore input.

Breaks Assumption arxiv | Mar 13

LongFlow provides an 11x throughput boost for reasoning models by specifically optimizing KV cache for long-output (vs long-input) scenarios.

Efficiency Breakthrough arxiv | Mar 13

Manifold-Optimal Guidance reformulates Classifier-Free Guidance (CFG) as a Riemannian control problem, eliminating the artifacts and saturation typical of high guidance scales.

Paradigm Shift arxiv | Mar 13

Tiny Aya is a 3.35B parameter multilingual model that achieves state-of-the-art results across 70 languages, challenging the need for massive scale in global AI.

Open Release arxiv | Mar 13

An empirical study reveals that models under 7B parameters have a fundamental utilization bottleneck that prevents them from using retrieved context effectively.

Breaks Assumption arxiv | Mar 13

Mobile-GS achieves real-time Gaussian Splatting on mobile devices by replacing the sorting-based alpha-blending bottleneck with depth-aware order-independent rendering.

Efficiency Breakthrough arxiv | Mar 13

Expert Threshold Routing (ET) replaces standard top-k token-choice with an independent thresholding mechanism, achieving 1.6x faster training convergence.

Paradigm Shift arxiv | Mar 13

RoboClaw introduces 'Entangled Action Pairs' to allow robots to autonomously collect data by learning to reset their own environment.

New Capability arxiv | Mar 13

The discovery of 'Helicoid Dynamics' identifies a critical safety failure where frontier LLMs accurately name their reasoning errors but are structurally unable to stop repeating them.

Breaks Assumption arxiv | Mar 13

Achieves 99.5% performance on Needle-In-A-Haystack benchmarks while retaining only 3% of the KV cache budget.

Efficiency Breakthrough arxiv | Mar 13

Applying Rotary Positional Embeddings (RoPE) to only 10% of hidden dimensions is sufficient for full model convergence, enabling 10x memory savings in positional caches.

Scaling Insight arxiv | Mar 13

Distills high-fidelity joint audio-visual generation into a real-time streaming model capable of 25 FPS on a single GPU.

Efficiency Breakthrough arxiv | Mar 13

Shows that simple sequential fine-tuning with LoRA outperforms complex algorithms for continual reinforcement learning in VLA models.

Breaks Assumption arxiv | Mar 13

Proves that policy gradient algorithms naturally collapse entropy and provides a mathematical fix to preserve exploration and diversity.

Breaks Assumption arxiv | Mar 13

Achieves hour-scale real-time human animation by solving the unbounded memory growth and inconsistent noise states in autoregressive diffusion.

Efficiency Breakthrough arxiv | Mar 13

Introduces the Compression-Consistency Principle, arguing that LLMs prefer truth only when false alternatives are structurally harder to compress.

Paradigm Shift arxiv | Mar 13

Replaces unstructured LLM debates with 'Deliberative Collective Intelligence,' producing formal decision packets with minority reports and accountability trails.

New Capability arxiv | Mar 13

Provides a learning-theoretic characterization of model collapse, proving exactly when replaying past outputs destroys model diversity.

Scaling Insight arxiv | Mar 13

Enables agents to autonomously discover the group structure of their environments to learn disentangled representations without human priors.

Paradigm Shift arxiv | Mar 13

Unifies leading membership inference attacks into a single framework and uses Bayesian variance inference to enable privacy auditing with 10x less compute.

Efficiency Breakthrough arxiv | Mar 13

Automates the entire robotic data generation loop, including a self-resetting mechanism that restores unstructured workspaces without human intervention.

New Capability arxiv | Mar 13

Bridges the gap between parametric CAD and direct B-Rep synthesis using LLMs and primitive grounding.

New Capability arxiv | Mar 13

Eliminates lookahead bias in financial backtesting through a series of yearly-partitioned pretrained LLMs.

Paradigm Shift arxiv | Mar 13

Recovers hidden ODE parameters from sparse data with a 487x speedup over gradient-based methods.

Efficiency Breakthrough arxiv | Mar 13

Eliminates the 2.5x latency penalty of dynamic adapters in LLMs via pre-gating and fused CUDA kernels.

Efficiency Breakthrough arxiv | Mar 13

Enables concurrent perception and reasoning for continuous video streams in Multimodal Large Language Models.

New Capability arxiv | Mar 13

Fits promptable visual segmentation (SAM) into a 1.3M parameter model for real-time in-sensor execution.

Efficiency Breakthrough arxiv | Mar 13

First framework for interpreting 4D molecular trajectories into natural language explanations.

New Capability arxiv | Mar 13

Exhaustive circuit mapping of a biological foundation model reveals massive redundancy and annotation bias.

Scaling Insight arxiv | Mar 13

Solves GNN over-squashing by using global effective resistance to identify and rewire structural bottlenecks.

Paradigm Shift arxiv | Mar 13

Cross-domain sensor model that handles variable signal lengths and resolutions without retraining.

New Capability arxiv | Mar 13

Achieves high-fidelity one-step (1 NFE) 3D robotic manipulation using training-time drifting fields.

Efficiency Breakthrough arxiv | Mar 13