AI & ML

1769 papers · Page 18 of 18

Machine unlearning in LLMs is often a 'mirage' that can be bypassed using simple multi-hop reasoning or entity aliasing.

Breaks Assumption arxiv | Mar 13

InstantHDR achieves high-quality 3D HDR reconstruction 700x faster than current optimization-based methods.

Efficiency Breakthrough arxiv | Mar 13

Theoretical analysis proves that Langevin dynamics is fundamentally non-robust to score function errors, justifying the shift to Diffusion Models.

Paradigm Shift arxiv | Mar 13

HAPO resolves the advantage collapse problem in sparse-reward RL for reasoning models using a Thompson-sampled hindsight mechanism.

Paradigm Shift arxiv | Mar 13

Adversarial prompt injection causes jailbreak success rates to transition from polynomial to exponential scaling with inference-time samples.

Scaling Insight arxiv | Mar 13

RewardHackingAgents establishes a benchmark for evaluating whether ML-engineering agents are actually solving tasks or just tampering with the evaluation code.

New Capability arxiv | Mar 13

TimeSqueeze achieves 20x faster convergence and 8x higher data efficiency for time-series foundation models by using dynamic, content-aware patching.

Efficiency Breakthrough arxiv | Mar 13

MirrorDrift demonstrates a successful SLAM-targeted attack on production-grade 'secure' LiDARs using simple actuated mirrors rather than complex signal injection.

Breaks Assumption arxiv | Mar 13

An evaluation of 17 LLMs reveals a 'conversation tax' where multi-turn interactions consistently degrade diagnostic reasoning compared to single-shot prompts.

Breaks Assumption arxiv | Mar 13

This paper introduces Finsler geometry to manifold learning, allowing for the capture of asymmetric data relationships like density hierarchies that Riemannian methods ignore.

Paradigm Shift arxiv | Mar 13

Re-evaluating high-profile medical AI safety claims reveals that reported triage failures were artifacts of the 'exam-style' evaluation format rather than model incapacity.

Breaks Assumption arxiv | Mar 13

DART enables real-time multi-class detection for open-vocabulary models like SAM3, achieving up to 25x speedup without any weight modifications.

Efficiency Breakthrough arxiv | Mar 13

Softmax normalization mathematically mandates the creation of attention sinks to serve as 'null states' when models need to ignore input.

Breaks Assumption arxiv | Mar 13

LongFlow provides an 11x throughput boost for reasoning models by specifically optimizing KV cache for long-output (vs long-input) scenarios.

Efficiency Breakthrough arxiv | Mar 13

Manifold-Optimal Guidance reformulates Classifier-Free Guidance (CFG) as a Riemannian control problem, eliminating the artifacts and saturation typical of high guidance scales.

Paradigm Shift arxiv | Mar 13

Tiny Aya is a 3.35B parameter multilingual model that achieves state-of-the-art results across 70 languages, challenging the need for massive scale in global AI.

Open Release arxiv | Mar 13

An empirical study reveals that models under 7B parameters have a fundamental utilization bottleneck that prevents them from using retrieved context effectively.

Breaks Assumption arxiv | Mar 13

Mobile-GS achieves real-time Gaussian Splatting on mobile devices by replacing the sorting-based alpha-blending bottleneck with depth-aware order-independent rendering.

Efficiency Breakthrough arxiv | Mar 13

Expert Threshold Routing (ET) replaces standard top-k token-choice with an independent thresholding mechanism, achieving 1.6x faster training convergence.

Paradigm Shift arxiv | Mar 13

RoboClaw introduces 'Entangled Action Pairs' to allow robots to autonomously collect data by learning to reset their own environment.

New Capability arxiv | Mar 13

The discovery of 'Helicoid Dynamics' identifies a critical safety failure where frontier LLMs accurately name their reasoning errors but are structurally unable to stop repeating them.

Breaks Assumption arxiv | Mar 13

Achieves 99.5% performance on Needle-In-A-Haystack benchmarks while retaining only 3% of the KV cache budget.

Efficiency Breakthrough arxiv | Mar 13

Applying Rotary Positional Embeddings (RoPE) to only 10% of hidden dimensions is sufficient for full model convergence, enabling 10x memory savings in positional caches.

Scaling Insight arxiv | Mar 13

Distills high-fidelity joint audio-visual generation into a real-time streaming model capable of 25 FPS on a single GPU.

Efficiency Breakthrough arxiv | Mar 13

Shows that simple sequential fine-tuning with LoRA outperforms complex algorithms for continual reinforcement learning in VLA models.

Breaks Assumption arxiv | Mar 13

Proves that policy gradient algorithms naturally collapse entropy and provides a mathematical fix to preserve exploration and diversity.

Breaks Assumption arxiv | Mar 13

Achieves hour-scale real-time human animation by solving the unbounded memory growth and inconsistent noise states in autoregressive diffusion.

Efficiency Breakthrough arxiv | Mar 13

Introduces the Compression-Consistency Principle, arguing that LLMs prefer truth only when false alternatives are structurally harder to compress.

Paradigm Shift arxiv | Mar 13

Replaces unstructured LLM debates with 'Deliberative Collective Intelligence,' producing formal decision packets with minority reports and accountability trails.

New Capability arxiv | Mar 13

Provides a learning-theoretic characterization of model collapse, proving exactly when replaying past outputs destroys model diversity.

Scaling Insight arxiv | Mar 13

Enables agents to autonomously discover the group structure of their environments to learn disentangled representations without human priors.

Paradigm Shift arxiv | Mar 13

Unifies leading membership inference attacks into a single framework and uses Bayesian variance inference to enable privacy auditing with 10x less compute.

Efficiency Breakthrough arxiv | Mar 13

Automates the entire robotic data generation loop, including a self-resetting mechanism that restores unstructured workspaces without human intervention.

New Capability arxiv | Mar 13

Bridges the gap between parametric CAD and direct B-Rep synthesis using LLMs and primitive grounding.

New Capability arxiv | Mar 13

Eliminates lookahead bias in financial backtesting through a series of yearly-partitioned pretrained LLMs.

Paradigm Shift arxiv | Mar 13

Recovers hidden ODE parameters from sparse data with a 487x speedup over gradient-based methods.

Efficiency Breakthrough arxiv | Mar 13

Eliminates the 2.5x latency penalty of dynamic adapters in LLMs via pre-gating and fused CUDA kernels.

Efficiency Breakthrough arxiv | Mar 13

Enables concurrent perception and reasoning for continuous video streams in Multimodal Large Language Models.

New Capability arxiv | Mar 13

Fits promptable visual segmentation (SAM) into a 1.3M parameter model for real-time in-sensor execution.

Efficiency Breakthrough arxiv | Mar 13

First framework for interpreting 4D molecular trajectories into natural language explanations.

New Capability arxiv | Mar 13

Exhaustive circuit mapping of a biological foundation model reveals massive redundancy and annotation bias.

Scaling Insight arxiv | Mar 13

Solves GNN over-squashing by using global effective resistance to identify and rewire structural bottlenecks.

Paradigm Shift arxiv | Mar 13

Cross-domain sensor model that handles variable signal lengths and resolutions without retraining.

New Capability arxiv | Mar 13

Achieves high-fidelity one-step (1 NFE) 3D robotic manipulation using training-time drifting fields.

Efficiency Breakthrough arxiv | Mar 13

Introduces the first billion-scale SAR vision foundation model and a massive unified benchmark for all-weather geospatial semantic segmentation.

Open Release arxiv | Mar 13

Demonstrates that simply using XML tags during translation outperforms complex pipelines for cross-lingual label projection while actually improving translation quality.

Breaks Assumption arxiv | Mar 13

Achieves up to 14.4x higher decoding throughput in long-context LLMs via a training-free framework that reuses sparse memory at semantic boundaries.

Efficiency Breakthrough arxiv | Mar 13

Enables multimodal agents to continually improve from experience and skills without any parameter updates through a dual-stream visual grounding framework.

New Capability arxiv | Mar 13

A 3D vision-language pipeline that grounds medical diagnosis in longitudinal brain MRI via regional volumetric assessments to eliminate VLM hallucinations.

New Capability arxiv | Mar 13

Integrates Neural ODEs with NeRFs to enable continuous-time scene dynamics that can extrapolate far beyond the original training sequence.

New Capability arxiv | Mar 13

Proposes a unified image tokenizer that reconciles the conflicting requirements of visual understanding and generation using a residual evolution process.

Paradigm Shift arxiv | Mar 13

Identifies and solves the 'information self-locking' failure mode where RL-trained agents stop asking informative questions in active reasoning tasks.

Breaks Assumption arxiv | Mar 13

A specialized distributed serving system for 'Any-to-Any' multimodal models that achieves 5.79x lower tail latency via component disaggregation.

Efficiency Breakthrough arxiv | Mar 13

Shows that LLM self-correction fails primarily due to 'session context' and can be significantly improved by moving the review to a fresh, independent session.

Breaks Assumption arxiv | Mar 13

Automates the generation of GPU-parallelized RL environments from text/code specifications, achieving up to 22,000x speedups for less than $10.

Efficiency Breakthrough arxiv | Mar 13

Establishes scaling laws for sampling compute in LLM Reinforcement Learning, providing a playbook for optimal parallel rollout and batch allocation.

Scaling Insight arxiv | Mar 13

Selects high-quality synthetic code data using 'Reverse Mutual Information' to achieve full-dataset performance with 75% less data.

Efficiency Breakthrough arxiv | Mar 13

Accelerates sparse attention by 75% by reusing lightning indexer decisions across layers, tackling the hidden bottleneck in production-grade LLMs.

Efficiency Breakthrough arxiv | Mar 13

Discovers that task-specific experts are so dense around pretrained weights that random parameter perturbations can compete with complex RL methods like PPO.

Breaks Assumption arxiv | Mar 13

Reveals that 'Reasoning LLMs-as-Judges' can lead to policies that generate highly effective adversarial outputs to deceive other judges and inflate benchmarks.

Breaks Assumption arxiv | Mar 13

Introduces a feature-matching objective for LLM fine-tuning that targets sequence-level statistics without requiring reward models or ground-truth verifiers.

Paradigm Shift arxiv | Mar 13

Integrates Chain-of-Thought reasoning directly into the Diffusion Transformer denoising process to solve complex spatial and logical tasks.

New Capability arxiv | Mar 13

Reduces visual tokens by up to 100x using an autoregressive gazing module, enabling 19x faster 4K/1000-frame video understanding.

Efficiency Breakthrough arxiv | Mar 13

Uncovers an emergent Hue-Saturation-Lightness (HSL) subspace in FLUX.1's VAE latent space, allowing for precise, training-free color control.

Breaks Assumption arxiv | Mar 13

Enables VideoLLMs to perform complex logical reasoning simultaneously with video playback without incurring the latency of standard test-time scaling.

New Capability arxiv | Mar 13

An open foundation model for humanoid robots that achieves high performance using only 30 hours of real-world robot data by pre-training on egocentric human videos.

Open Release arxiv | Mar 13

A unified streaming visual backbone that performs perception, 3D reconstruction, and robotic action simultaneously from a continuous video stream.

New Capability arxiv | Mar 13

Introduces adaptive video tokenization that allocates tokens based on scene complexity, reducing token usage by 24% while improving reconstruction quality.

Efficiency Breakthrough arxiv | Mar 13

Demonstrates that the stochasticity in standard regularized model training (like cross-validation) can serve as a 'free' and effective exploration strategy for contextual bandits.

Paradigm Shift arxiv | Mar 13