AI & ML

1625 papers · Page 8 of 17

Identifies that MLLMs fail to perceive visual illusions due to a high-frequency attention bias and provides a plug-and-play fix that boosts accuracy from 13% to 84%.

New Capability arxiv | Mar 25

Polaris introduces a 'Gödel Agent' framework that allows 7B-parameter models to recursively improve their own policies through auditable code patches.

New Capability arxiv | Mar 25

DILLO enables 14x faster safety-critical agent steering by predicting action consequences from latent states instead of heavy visual simulations.

Efficiency Breakthrough arxiv | Mar 25

Exposes a major flaw in medical super-resolution research where models trained on downsampled data fail to recover actual lost structures in real low-resolution scans.

Breaks Assumption arxiv | Mar 25

Connects stochastic optimal control to the Schrödinger equation, enabling analytic solutions for long-horizon problems that previously scaled exponentially with difficulty.

Paradigm Shift arxiv | Mar 25

ImplicitRM enables unbiased reward modeling from 'messy' implicit feedback (clicks/copies), drastically reducing the cost of RLHF data collection.

Efficiency Breakthrough arxiv | Mar 25

Introduces custom CUDA kernels and a sparse packing format that enables Transformers to maintain performance with over 99% feedforward sparsity.

Efficiency Breakthrough arxiv | Mar 25

Enables 3D medical image segmentation pre-training using only mathematical formulas and implicit functions, requiring zero real-world data or expert annotations.

Paradigm Shift arxiv | Mar 25

Develops a collaborative memory framework that distills agent-agnostic reasoning trajectories, allowing different LLM models to share a single memory system.

New Capability arxiv | Mar 25

Identifies functionally complete safety circuits in LLMs via differentiable binary masks, allowing for near-surgical removal of backdoors and jailbreaks.

New Capability arxiv | Mar 25

Uses Sparse Autoencoders (SAEs) to identify and steer cultural representations in LLMs, eliciting rare cultural concepts that prompting alone misses.

New Capability arxiv | Mar 25

Upgrades video Diffusion Transformers to ultra-high-resolution synthesis using a two-stage 'Relay LoRA' adaptation on pure images.

Efficiency Breakthrough arxiv | Mar 25

A dual-path architecture that combines speculative speech-to-speech prefixes with cascaded LLM continuations for zero-latency, high-quality dialogue.

Paradigm Shift arxiv | Mar 25

Challenges the dominance of on-policy RL for LLMs by introducing a practical off-policy value-based framework that enables data reuse.

Efficiency Breakthrough arxiv | Mar 25

A biology-native transformer architecture that mirrors cellular transcription and translation, enabling interpretable predictions across DNA, RNA, and protein.

Paradigm Shift arxiv | Mar 25

A unified framework that decomposes monolithic 3D meshes into 'sim-ready' interactive articulated assets using a sparse 3D VQ-VAE.

New Capability arxiv | Mar 25

Exposes 'shortcut learning' in differentiable simulators where models non-causally exploit future information to 'regret' past mistakes rather than learning to recover.

Breaks Assumption arxiv | Mar 25

A generative framework for graphs that closes the fidelity gap between energy-based models and discrete diffusion.

New Capability arxiv | Mar 25

Introduces a 'geospatial model foundry' that learns unified representations from the weights of existing models rather than raw data.

Paradigm Shift arxiv | Mar 25

An online length-aware scheduling strategy that eliminates training 'bubbles' during the rollout phase of LLM reinforcement learning.

Efficiency Breakthrough arxiv | Mar 25

A bilevel framework where an outer LLM loop meta-optimizes an inner autoresearch loop by autonomously generating and injecting Python code at runtime.

New Capability arxiv | Mar 25

Integrates tactile perception into video-action models to enable high-fidelity force modulation in contact-rich robotic tasks.

New Capability arxiv | Mar 25

Enables training of monocular novel-view synthesis models using entirely unpaired, in-the-wild internet images.

Paradigm Shift arxiv | Mar 25

Leverages human gaze tracking to assign non-uniform token density in diffusion models, creating perceptually perfect images with significantly less compute.

Efficiency Breakthrough arxiv | Mar 25

Replaces visual token compression with sparse, dynamically selected vision-language interactions in VLLMs.

Efficiency Breakthrough arxiv | Mar 25

A unified reinforcement learning framework that jointly optimizes reasoning (text) and synthesis (image) for interleaved multimodal generation.

New Capability arxiv | Mar 25

Introduces on-the-fly quantization that calibrates to individual prompts during inference, solving the 'domain shift' problem where standard quantization fails on unseen data.

Efficiency Breakthrough arxiv | Mar 25

Provides a statistically rigorous framework to evaluate model performance and reliability after cherry-picking or selecting models based on the same test data.

Paradigm Shift arxiv | Mar 25

Develops a differentially private RLHF pipeline that decouples private reward learning from policy optimization, achieving strong alignment on Gemma-2B-IT with privacy guarantees.

New Capability arxiv | Mar 25

AI is actually the most confident when it's completely making stuff up.

Paradigm Challenge arxiv | Mar 24

Future phones might have 'liquid' antennas that literally swim around inside the device to hunt down a better signal.

Practical Magic arxiv | Mar 24

A massive study found women do way more innovative science than men, but they still get robbed when it's time for the credit.

Paradigm Challenge arxiv | Mar 24

Scientists found a way to make a basic home computer screw up math exactly like a super-expensive AI chip does.

Practical Magic arxiv | Mar 24

A core rule of tech just got an update, and it turns out those fancy AI chips might eventually be totally useless.

Paradigm Challenge arxiv | Mar 24

New 360-degree video treats things on screen like they have gravity, just so it can predict exactly where you're gonna look next.

Practical Magic arxiv | Mar 24

Your future phone might have antennas that physically slide along tracks to 'pinch' the best Wi-Fi signal possible.

Practical Magic arxiv | Mar 24

An AI just 'figured out' how to lock down its own code using high-level math without a human ever telling it how.

Paradigm Challenge arxiv | Mar 24

Engineers built 'invisible' backdoors into computer chips that are so well-hidden, even the most powerful microscopes can't find them.

Nature Is Weird arxiv | Mar 24

Scientists found one single math formula that explains why everything from stock market crashes to earthquakes actually happens.

Nature Is Weird arxiv | Mar 24

Researchers built an AI sensor that 'thinks' using light ripples, letting it spot objects in about 25 billionths of a second.

Practical Magic arxiv | Mar 24

Researchers found one 'master' math trick that can recreate every single function on your old scientific calculator.

Paradigm Challenge arxiv | Mar 24

There’s a new AI that can tell you an animal’s whole lifestyle and what it looks like just by listening to it make a sound.

Nature Is Weird arxiv | Mar 24

A new voting system lets you check if a national election was legit using just basic math and zero computers.

Practical Magic arxiv | Mar 24

New math can spot life-threatening internal bleeding in patients before doctors can even see it.

Practical Magic arxiv | Mar 24

Those single scores we use to rank people on things like intelligence might actually be mathematical illusions.

Paradigm Challenge arxiv | Mar 24

AI can now map out the secret relationships between terrorist groups that they try to keep hidden.

Practical Magic arxiv | Mar 24

Achieves over 10x faster sampling for diffusion language models by shifting the process into continuous semantic space.

Efficiency Breakthrough arxiv | Mar 24

Integrates fast scalar rewards with slow generative CoT reasoning to reduce reward model token consumption by 20%.

Efficiency Breakthrough arxiv | Mar 24

Enables precise prompt routing by predicting the expected reward of a model before any response is generated.

Efficiency Breakthrough arxiv | Mar 24

Introduces a training strategy where Transformers 'think' in latent space before committing to discrete tokens.

Paradigm Shift arxiv | Mar 24

Composes pre-trained unimanual robotic policies into complex bimanual tasks without requiring bimanual demonstration data.

New Capability arxiv | Mar 24

Sets a new state-of-the-art for intracortical speech decoding with 14.3% phoneme error rate using a multitask Transformer.

New Capability arxiv | Mar 24

Proves mathematically that AI text detectors face structural limits that will always result in false positives against diverse student populations.

Breaks Assumption arxiv | Mar 24

The first foundation model for zero-shot prediction of joint probability distributions in coupled time series.

Paradigm Shift arxiv | Mar 24

Reduces Tree of Thought (ToT) computational overhead by up to 75% using plug-and-play predictors for pruning.

Efficiency Breakthrough arxiv | Mar 24

Formalizes 'Introspection' in LLMs and proves they have privileged access to their own policy logic beyond mere self-simulation.

Paradigm Shift arxiv | Mar 24

Releases an offline search-and-browse pipeline with 97K long-horizon trajectories for training 'Deep Research' agents.

Open Release arxiv | Mar 24

Demonstrates that algorithmic price collusion between LLM agents is fragile and easily broken by model heterogeneity.

Breaks Assumption arxiv | Mar 24

STAC achieves a 10x memory reduction and 4x speedup for real-time streaming 3D reconstruction using spatio-temporal cache compression.

Efficiency Breakthrough arxiv | Mar 24

AgentComm-Bench is the first benchmark to stress-test cooperative embodied AI under realistic wireless impairments like packet loss and bandwidth collapse.

Open Release arxiv | Mar 24

InjectFlow is a training-free method that fixes semantic degradation and bias in Flow Matching models by injecting orthogonal semantics into the velocity field.

New Capability arxiv | Mar 24

DiffMark enables multi-bit watermarking that is transferable across different frozen diffusion models with a 45x speedup over current methods.

Efficiency Breakthrough arxiv | Mar 24

Reason-to-Transmit introduces deliberative communication for multi-agent systems, where agents reason about *why* a message benefits the receiver rather than just broadcasting features.

Paradigm Shift arxiv | Mar 24

BubbleRAG enables high-precision retrieval-augmented generation over black-box Knowledge Graphs where the schema and structure are unknown.

New Capability arxiv | Mar 24

VGS-Decoding is a training-free method to mitigate medical VLM hallucinations by reweighting token probabilities based on their visual dependency.

Efficiency Breakthrough arxiv | Mar 24

This paper demonstrates that Model Context Protocol (MCP) can outperform traditional RAG for quantitative financial Q&A by interacting directly with structured data APIs.

Paradigm Shift arxiv | Mar 24

Researchers identify a 'selection bottleneck' that mathematically determines when diverse agent teams outperform homogeneous self-consistency teams.

Scaling Insight arxiv | Mar 24

The AI Mother Tongue (AIM) framework reveals that non-generative world models (V-JEPA) spontaneously learn discrete symbols and physical structures in their latent space.

Breaks Assumption arxiv | Mar 24

GEM is the first native graph-based index for multi-vector (ColBERT-style) retrieval, achieving up to 16x speedups over existing single-vector index adaptations.

Efficiency Breakthrough arxiv | Mar 24

Leum-VL-8B introduces a structural 'grammar' for video parsing by decomposing content into six film-production-style dimensions like camera language and editing.

Paradigm Shift arxiv | Mar 24

WebNavigator reframes autonomous web navigation from probabilistic exploration to deterministic pathfinding, doubling state-of-the-art success rates.

New Capability arxiv | Mar 24

ALARA for Agents provides a declarative framework for enforcing least-privilege tool access and context scoping in multi-agent systems.

New Capability arxiv | Mar 24

This paper shows that pretrained monocular models can perform multi-view human mesh recovery without camera calibration or multi-view training data.

Paradigm Shift arxiv | Mar 24

This work formalizes why 'human' mathematics is distinct from the space of all valid deductions using information-theoretic compression measurements on MathLib.

Scaling Insight arxiv | Mar 24

Claude Opus 4.6 combined with a formal proof assistant autonomously solved 10/12 Putnam 2025 math problems.

New Capability arxiv | Mar 24

Latent representations of reasoning survive cross-architecture translation, allowing student models to inherit teacher capabilities without training.

Paradigm Shift arxiv | Mar 24

Coding agents navigating a file system outperform SOTA long-context LLMs and RAG systems on massive datasets.

Paradigm Shift arxiv | Mar 24

A neural-symbolic pipeline discovers physical conservation laws from data without the false positives that plague previous methods in chaotic systems.

New Capability arxiv | Mar 24

AE-LLM automatically orchestrates the optimal combination of MoE, quantization, and PEFT for specific deployment hardware and tasks.

Efficiency Breakthrough arxiv | Mar 24

The most powerful reasoning models currently produce the least 'teachable' reasoning traces for smaller models.

Breaks Assumption arxiv | Mar 24

Distilling the internal process of expert systems into natural language allows small models to outperform proprietary LLMs in complex domains like Chess.

Paradigm Shift arxiv | Mar 24

ReBOL replaces standard top-k vector retrieval with an iterative Bayesian Optimization process over document relevance.

Paradigm Shift arxiv | Mar 24

Delightful Policy Gradient uses 'delight' (advantage x surprisal) to fix learning from stale or buggy data in distributed RL.

Paradigm Shift arxiv | Mar 24

Row-Momentum Normalized Preconditioning (RMNP) provides Muon-level performance with significantly lower computational complexity.

Efficiency Breakthrough arxiv | Mar 24

3D object localization can be achieved 100x faster by using image-based 'visual memory' instead of global 3D scene reconstruction.

Efficiency Breakthrough arxiv | Mar 24

Vision-Language Models can be steered to understand negation using geometry-based representation engineering without any fine-tuning.

Efficiency Breakthrough arxiv | Mar 24

Memory-Keyed Attention (MKA) achieves 5x faster training throughput and nearly 2x lower latency while matching the accuracy of compressed attention variants.

Efficiency Breakthrough arxiv | Mar 24

GaussianPile adapts 3D Gaussian Splatting for volumetric imaging, achieving 11x faster reconstruction than NeRFs and 16x compression over voxel grids.

Efficiency Breakthrough arxiv | Mar 24

MixedDimKV achieves 100% accuracy on 50K context lengths while using as little as 0.26% of the traditional KV cache.

Efficiency Breakthrough arxiv | Mar 24

Large Reasoning Models (LRMs) are shown to systematically lie about their reasoning traces, following injected hints while fabricating unrelated explanations.

Breaks Assumption arxiv | Mar 24

Continued Fraction Neural Networks (CFNN) introduce a rational inductive bias that handles singularities with 10-100x fewer parameters than standard MLPs.

Paradigm Shift arxiv | Mar 24

ScaleEdit-12M is the largest open-source image editing dataset, democratizing high-quality, instruction-based editing data previously limited to proprietary models.

Open Release arxiv | Mar 24

A low-resource SOP using 'Shadow-RAG' enables 32B models to reach 90% accuracy on graduate-level exams with only 3 days of labor.

Efficiency Breakthrough arxiv | Mar 24

PAVE introduces an inference-time validation layer that decomposes context into atomic facts to boost RAG accuracy by up to 32 points.

New Capability arxiv | Mar 24

Random Forest ensembles achieve #1 on the OGB-molhiv leaderboard, outperforming complex GNNs and pre-trained models.

Breaks Assumption arxiv | Mar 24

Network-of-Thought (NoT) moves LLM reasoning from linear chains and trees to complex directed graphs, significantly improving multi-hop QA.

Paradigm Shift arxiv | Mar 24

Reveals that RL from verifiable rewards (RLVR) fails to improve general QA due to 'shortcuts' and proposes START to fix it.

Breaks Assumption arxiv | Mar 24

Discovers that language-centric training in Multimodal LLMs actively degrades their internal visual representation quality.

Scaling Insight arxiv | Mar 24

Swim2Real uses a VLM as a 'closed-loop' feedback mechanism to calibrate complex robotic simulators directly from video.

New Capability arxiv | Mar 24

MEGA introduces a way to edit LLM knowledge via mechanism-guided activation steering instead of permanent weight modifications.

New Capability arxiv | Mar 24