AI & ML

1625 papers · Page 15 of 17

Introduces RenderMem, a spatial memory system that treats rendering as a query interface for embodied agents to reason about 3D geometry and occlusion.

Paradigm Shift arxiv | Mar 17

Reveals that larger language models are significantly better at concealing knowledge during audits, with detection traces vanishing beyond 70 billion parameters.

Breaks Assumption arxiv | Mar 17

Achieves pose-free 3D Gaussian Splatting using only event streams, enabling reconstruction in extreme lighting and high-speed motion scenarios.

New Capability arxiv | Mar 17

Reformulates diffusion sampling as a graph-theoretic planning problem that dynamically allocates compute to the most difficult denoising stages.

Efficiency Breakthrough arxiv | Mar 17

Formalizes the 'Visual Confused Deputy' attack, where agents are tricked into authorizing privileged actions via slight visual screen manipulations.

Breaks Assumption arxiv | Mar 17

Generates novel, structurally plausible protein sequences from small alignments using a training-free stochastic attention mechanism on a standard laptop.

Efficiency Breakthrough arxiv | Mar 17

Explicit identity framing is not necessary and may be inferior for low-data LoRA safety fine-tuning.

Breaks Assumption arxiv | Mar 17

Gauge-equivariant neural operators enable discretization-invariant and geometry-consistent solving of complex PDEs.

Paradigm Shift arxiv | Mar 17

Adaptive computation for multimodal LLMs drastically reduces compute waste on easy cases while focusing on hard ones.

Efficiency Breakthrough arxiv | Mar 17

BrainBench exposes a significant gap between LLM benchmark performance and genuine commonsense reasoning.

Breaks Assumption arxiv | Mar 17

A training-free operator for streaming 3D reconstruction reduces geometric drift using Grassmannian manifolds.

New Capability arxiv | Mar 17

POLCA uses LLMs as stochastic optimizers with theoretical convergence guarantees for complex system-level tasks.

Paradigm Shift arxiv | Mar 17

DynaAvatar achieves zero-shot 3D human reconstruction from a single image with motion-dependent cloth dynamics.

New Capability arxiv | Mar 17

HO-SFL enables backprop-free fine-tuning on edge devices without the convergence penalty typical of zeroth-order methods.

Efficiency Breakthrough arxiv | Mar 17

Agent architectures require an explicit epistemic control layer to route questions between incompatible reasoning frameworks.

Paradigm Shift arxiv | Mar 17

RAZOR provides a lightweight, targeted unlearning framework for Transformers and Diffusion models without retraining.

Efficiency Breakthrough arxiv | Mar 17

Demonstrates that safety and utility in LVLMs are not inherently antagonistic and can be simultaneously improved through inference-time projection.

Breaks Assumption arxiv | Mar 17

Provides the first theoretical proof that dataset distillation efficiently encodes the low-dimensional structure of non-linear tasks.

Scaling Insight arxiv | Mar 17

Proves a fundamental expressivity limit where Message-Passing Graph Neural Networks are infinitely weaker than standard Color Refinement algorithms.

Breaks Assumption arxiv | Mar 17

Introduces an asynchronous Mixture-of-Transformers architecture for autonomous driving that decouples slow reasoning from fast action execution.

Efficiency Breakthrough arxiv | Mar 17

Releases an 11-billion example dataset and model (RealVLG-R1) for unified real-world visual-language grounding and robotic manipulation.

Open Release arxiv | Mar 17

Achieves over 80% of full-resolution VLM performance while using only 1% of the original pixel budget through bio-inspired foveated sampling.

Efficiency Breakthrough arxiv | Mar 17

Applies Signal Detection Theory to reveal that standard LLM calibration metrics conflate sensitivity (knowledge) with bias (confidence), leading to misleading evaluations.

Paradigm Shift arxiv | Mar 17

A unified graph propagation library achieving 35,000x speedups, enabling full simulations on billion-edge graphs in seconds.

Efficiency Breakthrough arxiv | Mar 17

Releases a million-scale human preference dataset (29M pairs) specifically for text-to-image editing tasks.

Open Release arxiv | Mar 17

Introduces 'Directional Routing', a lightweight mechanism that becomes the dominant computational pathway and enables transformers to self-organize into syntactic and adaptive regimes.

Paradigm Shift arxiv | Mar 17

Recasts the LLM itself as a graph-native aggregation operator (Graph Kernel) for message passing on text-rich graphs.

Paradigm Shift arxiv | Mar 17

Attention Residuals replace fixed-weight residual connections with softmax attention over preceding layers to prevent hidden-state dilution in deep LLMs.

Scaling Insight arxiv | Mar 17

MUNKEY introduces a 'design-to-forget' paradigm where machine unlearning is achieved through zero-shot key deletion rather than expensive parameter updates.

Paradigm Shift arxiv | Mar 17

AdaAnchor enables LLMs to perform multi-step reasoning entirely in latent space with an adaptive halting mechanism to optimize compute.

Efficiency Breakthrough arxiv | Mar 17

AnoleVLA replaces the standard Transformer backbone in robotic Vision-Language-Action models with Deep State Space Models for a 3x speedup.

Efficiency Breakthrough arxiv | Mar 17

This paper reveals that pre-trained image editing models can be repurposed for video frame interpolation using only a few hundred LoRA samples.

Paradigm Shift arxiv | Mar 17

Researchers identify 'Agentic Pressure' as a phenomenon where increased reasoning capability actually helps models rationalize and execute safety violations.

Breaks Assumption arxiv | Mar 17

Writer-R1-4B outperforms 100B+ parameter models in creative writing by utilizing memory-augmented self-reflection and fine-grained criteria generation.

Efficiency Breakthrough arxiv | Mar 17

Euler Characteristic Surfaces achieve 98% accuracy on time-series classification with O(n) complexity, crushing previous topological methods that only hit 62%.

New Capability arxiv | Mar 17

Small models (<=4B) fail document extraction not because of poor vision, but due to 'schema echo' where they copy the output structure instead of extracting data.

Breaks Assumption arxiv | Mar 17

Ultra-low-bitrate image compression achieves 50% bitrate savings by treating decoding as a 'next-frame' video prediction task using diffusion priors.

Efficiency Breakthrough arxiv | Mar 17

Waypoint Diffusion Transformers (WiT) untangle pixel-space generation by using semantic waypoints, bypassing the need for information-lossy latent autoencoders.

Paradigm Shift arxiv | Mar 17

LLM-based judges are negatively correlated with actual future research impact, systematically overvaluing 'novel-sounding' ideas that never materialize.

Paradigm Shift arxiv | Mar 17

ForceVLA2 introduces explicit force awareness and hybrid control to Vision-Language-Action models, enabling stable contact-rich manipulation.

New Capability arxiv | Mar 17

Recurrent gradient transport is massively redundant: propagating through just 6% of paths recovers nearly all adaptation ability in online learning.

Breaks Assumption arxiv | Mar 17

The anonymity of leaderboards like LM Arena can be compromised using Interpolated Preference Learning to identify target models based on stylistic signatures.

Breaks Assumption arxiv | Mar 17

SCAN enables reliable sequential knowledge editing in LLMs for up to 3,000 edits without the catastrophic forgetting or model collapse seen in current methods.

New Capability arxiv | Mar 17

This physics-informed VLM framework improves physics-grounded anomaly detection AUROC from 66.9% to 96.7%.

New Capability arxiv | Mar 17

HapticVLA achieves tactile-aware robotic manipulation at 86.7% success rate without requiring any physical tactile sensors at inference time.

Efficiency Breakthrough arxiv | Mar 17

IConE enables stable self-supervised learning even at batch size 1, overcoming the memory bottlenecks of high-dimensional scientific and medical data.

Efficiency Breakthrough arxiv | Mar 17

FlashU is the first framework to accelerate unified multimodal models by exploiting the distinct neuron sets used for generation vs. understanding.

Efficiency Breakthrough arxiv | Mar 17

GVC1D achieves over 60% bitrate reduction in video compression by replacing standard 2D latent grids with compact 1D latent tokens.

Paradigm Shift arxiv | Mar 17

Tagarela releases 8,972 hours of high-quality Portuguese podcast audio, rivaling the scale of GigaSpeech for English.

Open Release arxiv | Mar 17

MeMix is a training-free, plug-and-play module that reduces 3D reconstruction error by up to 40% in long sequences by mitigating state drift.

Efficiency Breakthrough arxiv | Mar 17

FuXiWeather2 is a unified end-to-end neural framework for weather assimilation and forecasting that outperforms global operational systems.

New Capability arxiv | Mar 17

This paper proves that increasing test-time compute via beam search can actually hurt LLM reasoning performance due to overestimation bias.

Scaling Insight arxiv | Mar 17

Sparsity (MoE and GQA) is found to act as a critical regulator for variance propagation, mitigating the 'curse of depth' in LLMs.

Scaling Insight arxiv | Mar 17

Test-time reinforcement learning (TTRL) is found to amplify model harmfulness and jailbreak vulnerability when exposed to malicious prompt injections.

Breaks Assumption arxiv | Mar 17

A large-scale study reveals that 78% of AI failures are 'invisible,' where the system fails without the user realizing or indicating an error.

Paradigm Shift arxiv | Mar 17

Incorporating PDE residuals into fine-tuning allows pre-trained physics foundation models to adapt to new tasks without requiring ground-truth solutions.

New Capability arxiv | Mar 17

PrismMirror is the first monocular human frontal view synthesis model to achieve real-time inference (24 FPS) without external geometric models.

Efficiency Breakthrough arxiv | Mar 17

Challenges the 'Flat Minima' hypothesis by showing that grokking is driven by anisotropic noise rectification rather than finding flat regions.

Breaks Assumption arxiv | Mar 17

A 4B parameter model matches a 120B parameter model in program verification through a rigorous data curation pipeline.

Efficiency Breakthrough arxiv | Mar 17

Bridges the gap between generative (MAE) and predictive (I-JEPA) self-supervised learning, achieving a 10% performance boost.

Efficiency Breakthrough arxiv | Mar 17

Mamba-3 introduces MIMO formulations and complex-valued updates to solve the state-tracking failures of previous linear models.

New Capability arxiv | Mar 17

Democratizes the development of 'Deep Search' agents by open-sourcing the specialized training data and trajectory synthesis methods.

Open Release arxiv | Mar 17

Proves that simple deterministic ranking beats expensive LLM-based structuring for conversational memory retrieval.

Breaks Assumption arxiv | Mar 17

Accelerates state-of-the-art 3D human mesh recovery by over 10x, enabling real-time vision-only humanoid teleoperation.

Efficiency Breakthrough arxiv | Mar 17

Introduces an adversarial co-evolution framework where Code and Test LLMs optimize against each other to improve code generation.

Paradigm Shift arxiv | Mar 17

Uses Sparse Autoencoders (SAEs) to mechanisticially repair 'moral indifference' in LLM latent representations.

New Capability arxiv | Mar 17

A benchmark for unsolved math problems with automated verification, enabling the measurement of true mathematical discovery.

New Capability arxiv | Mar 17

Introduces Mixture-of-Depths Attention (MoDA) to solve signal degradation in deep LLMs with hardware-efficient implementation.

Efficiency Breakthrough arxiv | Mar 17

Proves that standard acquisition functions like UCB are sufficient for asynchronous Bayesian Optimization, debunking the need for complex diversity-enforcing strategies.

Breaks Assumption arxiv | Mar 17

Settles the long-standing practitioner debate over whether to use training or holdout data for interpreting black-box models with PD/ALE plots.

Breaks Assumption arxiv | Mar 17

Enables Bayesian model selection and joint posterior inference over combinatorial spaces of up to billions of simulator model instantiations.

New Capability arxiv | Mar 17

Achieves 1,000x speedups in Bayesian inverse problems by replacing repeated MCMC sampling with one-step preconditioned generative transport.

Efficiency Breakthrough arxiv | Mar 17

Imagine a paper-thin sticker you can slap on a wall to listen to the room next door, and get this—it doesn't even need a battery.

Practical Magic arxiv | Mar 16

Future 6G antennas are going to literally slide around on your phone to grab a signal so sharp it shouldn't even be possible.

Paradigm Challenge arxiv | Mar 16

ActTail achieves 80% activation sparsity in LLMs with significantly lower perplexity degradation than uniform methods by using Heavy-Tailed Self-Regularization theory.

Efficiency Breakthrough arxiv | Mar 16

This paper proposes a method to align and personalize LLMs directly from raw user interactions using self-distillation, bypassing the need for explicit human labels or RLHF.

Paradigm Shift arxiv | Mar 16

The researchers demonstrate that prompt injection is caused by 'role confusion' in the latent space, where models assign authority based on the style of writing rather than the source of the text.

Breaks Assumption arxiv | Mar 16

This theoretical work refutes the 'Garbage In, Garbage Out' mantra for modern ML, proving that high-dimensional model capacity can asymptotically overcome predictor error and structural uncertainty.

Breaks Assumption arxiv | Mar 16

Introduces the Budget-Sensitive Discovery Score (BSDS), a formally verified metric machine-checked in Lean 4 for evaluating AI-guided scientific candidate selection.

Paradigm Shift arxiv | Mar 16

ReBalance is a training-free framework that dynamically modulates 'thinking' length in reasoning models to prune redundancy during overthinking and promote exploration during underthinking.

Efficiency Breakthrough arxiv | Mar 16

This study proves that reasoning traces (Chain-of-Thought) causally shape model behavior and generalization, even when the final answer is held constant.

Breaks Assumption arxiv | Mar 16

SpectralGuard identifies a 'memory collapse' vulnerability in State Space Models (like Mamba) where adversarial inputs can drive the transition operator's spectral radius to zero.

Breaks Assumption arxiv | Mar 16

Surg-R1 is a specialized surgical reasoning model released alongside the largest surgical Chain-of-Thought dataset (320,000 pairs).

Open Release arxiv | Mar 16

This paper establishes a systematic protocol for 'stitching' heterogeneous Vision Foundation Models (e.g., CLIP and DINOv2) to share early layers while retaining specialized capabilities.

Paradigm Shift arxiv | Mar 16

Achieves 100x speedup in robotic action generation by distilling iterative flow/diffusion models into a one-step policy without a pre-trained teacher.

Efficiency Breakthrough arxiv | Mar 16

Introduces Modal Logical Neural Networks (MLNNs) as a differentiable logic layer that bridges deep learning with symbolic Kripke semantics for regulated AI.

Paradigm Shift arxiv | Mar 16

Demonstrates a robot that improves its own locomotion by identifying and physically 'self-destructing' redundant or inhibiting limbs during its lifetime.

Paradigm Shift arxiv | Mar 16

Enables training-free infinite video generation (hour-scale) by using evolving memory tokens to solve identity drift and motion stagnation.

New Capability arxiv | Mar 16

Reveals that standard global correlation metrics for LLM judges fail to predict success in 'best-of-n' selection tasks due to within-prompt signal loss.

Breaks Assumption arxiv | Mar 16

Reduces Chain-of-Thought (CoT) compute costs by 14-55% by learning the optimal 'early-exit' points for Large Reasoning Models.

Efficiency Breakthrough arxiv | Mar 16

Discovers that as LLMs scale, their complex non-linear depth dynamics converge into accurate, low-order linear surrogates.

Scaling Insight arxiv | Mar 16

Derives an exact, unbiased policy gradient for Reinforcement Learning on Diffusion LLMs, bypassing the need for sequence-level likelihood approximations.

Paradigm Shift arxiv | Mar 16

Shows that tool-augmented agents suffer from 'recommendation drift' where they provide unsafe advice under tool corruption while maintaining high ranking scores.

Breaks Assumption arxiv | Mar 16

Accelerates Diffusion Transformers (DiTs) by 2x using a training-free framework that selectively reduces computation in non-aesthetic image regions.

Efficiency Breakthrough arxiv | Mar 16

Challenges the standard practice of deep PPO training by proving that consensus aggregation of 'wider' parallel runs is 8x more sample efficient than multiple epochs.

Breaks Assumption arxiv | Mar 16

Releases Feynman, an agentic pipeline and 100k-sample dataset for generating high-quality, knowledge-rich diagrams with grounded captions.

Open Release arxiv | Mar 16

Introduces the largest-ever multi-modal CAD dataset with 10 million annotations for 1 million models to enable geometric deep learning on BRep data.

Open Release arxiv | Mar 16

Unlocks Maximum Entropy RL for high-dimensional humanoid control, matching or doubling the performance of dominant deterministic baselines.

New Capability arxiv | Mar 16

Introduces a training-free framework that allows LLM agents to dynamically scale their reasoning depth based on a pre-defined token/tool budget.

Efficiency Breakthrough arxiv | Mar 16

Achieves a 98x speedup in LLM routing on AMD hardware using Flash Attention and prompt compression, enabling high-context classification without a dedicated GPU.

Efficiency Breakthrough arxiv | Mar 16