AI & ML

1625 papers · Page 4 of 17

Challenges the necessity of discrete action tokenizers in robotics by using a continuous, single-stage flow matching policy.

Paradigm Shift arxiv | Mar 31

Moves autonomous driving from 'predict-then-plan' to an interleaved VLA model where future frames and ego-actions are generated step-by-step.

New Capability arxiv | Mar 31

A non-Turing-complete DSL that compiles high-level LLM routing and agent policies directly into verified infrastructure artifacts like Kubernetes NetworkPolicies.

New Capability arxiv | Mar 31

Introduces a marketplace infrastructure that rebrands AI agents from mere tools into peer participants in a verifiable production network.

Paradigm Shift arxiv | Mar 31

Scales Maximum Entropy population synthesis from 20 to 50+ categorical attributes by replacing exact expectation sums with Persistent Contrastive Divergence.

Efficiency Breakthrough arxiv | Mar 31

Reveals that the tight architectural coupling of image generation and understanding in unified models creates a new class of reciprocal safety vulnerabilities.

Breaks Assumption arxiv | Mar 31

Introduces a vision model testbed that aligns AI visual attention (scanpaths) with human gaze without sacrificing classification accuracy.

Paradigm Shift arxiv | Mar 31

Shows that standard task-completion benchmarks fail to distinguish agent capabilities and proposes 'Working Memory Fidelity' as a more predictive metric.

Scaling Insight arxiv | Mar 31

The first self-supervised, domain-agnostic model for LiDAR ground segmentation, eliminating the need for per-sensor manual labeling.

Open Release arxiv | Mar 31

A production-grade framework that converts LLM/RAG evaluation into a deployment decision workflow using Pareto frontiers and CI gates.

New Capability arxiv | Mar 31

Collapses the standard vision backbone-plus-decoder architecture into a single early-fusion Transformer stack for both perception and task modeling.

Paradigm Shift arxiv | Mar 31

Couples visual representations directly into the RL optimization process (RLVR) for vision-language models using a structured reward reweighting mechanism.

Paradigm Shift arxiv | Mar 31

A unified framework for neural network recombination that achieves state-of-the-art fine-tuning with fewer than 200 parameters.

Efficiency Breakthrough arxiv | Mar 31

Enables Active Learning for tabular data without model retraining by iteratively optimizing the 'labeled context' of foundation models.

New Capability arxiv | Mar 31

Harmful intent in LLMs can be detected geometrically even after safety 'refusal' mechanisms have been surgically removed.

Breaks Assumption arxiv | Mar 31

For LLM-driven optimization, complex meta-heuristics like simulated annealing are unnecessary; simple greedy hill climbing is a superior default.

Breaks Assumption arxiv | Mar 31

Mathematical proof that LayerNorm structurally reduces model complexity compared to RMSNorm due to its mean-centering geometry.

Scaling Insight arxiv | Mar 31

Proposes 'Amdahl’s Law for AI,' proving that human effort in AI-assisted work is bottlenecked by the fraction of 'novel' tasks rather than agent capability.

Paradigm Shift arxiv | Mar 31

Lie Generator Networks enable linear system identification with guaranteed physical stability and dissipation by construction rather than through loss penalties.

New Capability arxiv | Mar 31

GIFT bootstraps image-to-CAD generation by turning inference-time failures into synthetic training data, reducing inference compute by 80%.

Efficiency Breakthrough arxiv | Mar 31

A modular, JAX-based framework and taxonomy for Reinforcement Learning with Diffusion and Flow policies.

Open Release arxiv | Mar 31

Achieves high-quality 3D reconstruction and camera pose estimation from sparse views without any pre-trained priors or ground-truth annotations.

New Capability arxiv | Mar 31

Near-lossless KV cache compression using angular quantization in the Walsh-Hadamard domain at ~3.5 bits per element.

Efficiency Breakthrough arxiv | Mar 31

Mechanistic analysis reveals that over-refusal and harmful-intent refusal in LLMs occupy distinct representation subspaces.

Breaks Assumption arxiv | Mar 31

Introduces 'Hidden Ads,' a new class of semantic backdoor attacks that inject promotional content into VLM responses based on natural user behavior.

New Capability arxiv | Mar 31

Shifts protein fitness optimization from continuous embeddings to discrete Quadratic Unconstrained Binary Optimization (QUBO).

Paradigm Shift arxiv | Mar 31

Introduces LongCat-Next, a 'Native Multimodal' model that treats vision and audio as first-class discrete tokens rather than language-centric attachments.

Paradigm Shift arxiv | Mar 31

Achieves zero-shot, prompt-free object removal in diffusion models purely through self-attention manipulation.

New Capability arxiv | Mar 31

VoxAnchor uses mmWave radar to authenticate speech by matching acoustics to physical throat vibrations.

New Capability arxiv | Mar 31

RAGent enables training-free, deployment-time human activity recognition for mmWave radar using agentic reasoning.

New Capability arxiv | Mar 31

Proposes SOL-Nav, which replaces raw visual features in navigation with structured language descriptions for LLM-based agents.

Paradigm Shift arxiv | Mar 31

Bridges the gap between free-form natural language and safety-critical UAV navigation using Signal Temporal Logic (STL) translation and repair.

New Capability arxiv | Mar 31

Sci-Mind introduces an 'Adversarial Cognitive Dialectic' where specialized agents debate to refine mathematical models.

Paradigm Shift arxiv | Mar 31

Achieves a 79,000x reduction in energy per inference for insulin dose calculation using Spiking Neural Networks (SNNs).

Efficiency Breakthrough arxiv | Mar 31

Introduces 'Umwelt Engineering,' the deliberate constraint of an agent's linguistic environment to improve reasoning.

Paradigm Shift arxiv | Mar 31

PRBench reveals that current top-tier coding agents have a 0% success rate in end-to-end physics paper reproduction.

Breaks Assumption arxiv | Mar 31

Introduces Composer, a paradigm that generates input-specific parameter adaptations at inference time to enable dynamic per-input model specialization.

Paradigm Shift arxiv | Mar 31

Kuaishou releases KAT-Coder-V2, an agentic coding model achieving state-of-the-art results on SWE-bench Verified through a 'Specialize-then-Unify' paradigm.

Open Release arxiv | Mar 31

Provides empirical evidence and a mechanistic explanation for why LoRA drastically reduces catastrophic forgetting in sequential fine-tuning compared to full fine-tuning.

Scaling Insight arxiv | Mar 31

TianJi is the first 'AI meteorologist' system capable of autonomously driving complex numerical models to verify physical hypotheses in atmospheric science.

New Capability arxiv | Mar 31

A controlled study proving that the temporal organization (curriculum) of multimodal data is a first-order variable in balancing reasoning vs. OCR capabilities.

Scaling Insight arxiv | Mar 31

SkyNet extends MuZero to partially-observable stochastic games by adding auxiliary belief-aware heads, significantly outperforming baselines in complex card games.

Paradigm Shift arxiv | Mar 31

Heracles uses a state-conditioned diffusion middleware to bridge precise motion tracking with generative recovery for humanoid robots.

New Capability arxiv | Mar 31

Sortify is the first fully autonomous LLM agent deployed in production for closed-loop recommendation ranking optimization.

New Capability arxiv | Mar 31

AutoStan demonstrates a CLI coding agent that autonomously builds and iteratively improves interpretable Bayesian models in Stan.

New Capability arxiv | Mar 31

Identifies emergent social risks in multi-agent systems, such as spontaneous collusion and conformity, that occur even when agents are not explicitly instructed to do so.

Breaks Assumption arxiv | Mar 31

Uses spectral decomposition of inverse dynamics to enable real-time planning of long-horizon robotic manipulation tasks (10+ contact modes).

Efficiency Breakthrough arxiv | Mar 31

Introduces SCOUT, a routing framework that intelligently selects which Image-to-3D reconstruction model to use based on input difficulty and cost constraints.

New Capability arxiv | Mar 31

GraySense enables geospatial object tracking using only encrypted network packet sizes without any access to raw video streams.

New Capability arxiv | Mar 31

KVSculpt moves beyond simple eviction/merging to optimize unconstrained KV pairs in continuous space for extreme cache compression.

Efficiency Breakthrough arxiv | Mar 31

A rigorous analysis of the AIMO 3 math competition reveals that raw model capability dominates inference-time prompt optimization by an order of magnitude.

Breaks Assumption arxiv | Mar 31

Wan-R1 successfully applies Group Relative Policy Optimization (GRPO) to flow-based video models to enable verifiable spatial reasoning.

New Capability arxiv | Mar 31

The eigenvalue tail index of a neural network's weight matrices serves as a near-perfect (R^2 = 0.984) diagnostic for label noise in the training data.

Scaling Insight arxiv | Mar 31

Poppy provides a training-free way to refine monocular surface normals using single-shot polarization measurements at test time.

New Capability arxiv | Mar 31

SAGE mitigates multimodal hallucinations by monitoring 'attention sinks' and dynamically modulating self-attention during the decoding process.

Efficiency Breakthrough arxiv | Mar 31

ATLAS-RTC introduces token-level runtime control that detects and corrects LLM drift from structured output contracts during the forward pass.

New Capability arxiv | Mar 31

Guardrails successfully implements and flight-tests Control Barrier Functions on an F-16 fighter jet to enforce safety limits in real-time.

New Capability arxiv | Mar 31

ITQ3_S achieves high-fidelity 3-bit LLM inference by using rotation-domain smoothing to eliminate the catastrophic precision loss caused by outliers.

Efficiency Breakthrough arxiv | Mar 31

The Physics-Guided Transformer (PGT) embeds physical priors (like diffusion and causality) directly into the self-attention mechanism via heat-kernel biases.

Paradigm Shift arxiv | Mar 31

Iterative Motion Imitation enables bicycle robots to perform unassisted front-flips by learning from initially 'impossible' reference motions.

New Capability arxiv | Mar 31

Proteina-Complexa unifies generative flow-based modeling with structure-based 'hallucination' to set a new SOTA in atomistic protein binder design.

New Capability arxiv | Mar 31

ExFusion enables Transformer models to gain the capacity of Mixture-of-Experts during training while remaining a standard dense model for deployment.

Efficiency Breakthrough arxiv | Mar 31

SARL improves reasoning models by rewarding the 'topology' of thoughts rather than just the final answer, enabling effective RL without ground-truth labels.

Paradigm Shift arxiv | Mar 31

Dataset Concentration (DsCo) achieves nearly lossless dataset reduction by aligning distributions via diffusion models, cutting storage and training costs by half.

Efficiency Breakthrough arxiv | Mar 31

Correlated Diffusion replaces independent noise with structured MCMC dynamics, enabling generative modeling on hyper-efficient probabilistic computers.

Paradigm Shift arxiv | Mar 31

This study challenges the common 'best practice' of atomic decomposition for LLM judges, showing that holistic evaluation is often superior at detecting incompleteness.

Breaks Assumption arxiv | Mar 31

An autonomous agent reveals that domain-specific molecular architectures are largely unnecessary; standard transformers with better tuning outperform custom designs.

Breaks Assumption arxiv | Mar 31

Decoupled language models reduce the compute required for OCR domain adaptation by 95% while matching SOTA transformer accuracy.

Efficiency Breakthrough arxiv | Mar 31

This paper clarifies that Diffusion Maps (DMAPs) are not actually a dimensionality reduction tool, but rather a spectral representation that requires specific combinations to form a chart.

Paradigm Shift arxiv | Mar 31

The first framework for bit-identical deep learning training that produces MD5-verified identical weights across independent runs.

New Capability arxiv | Mar 31

Drift-AR enables single-step (1-NFE) high-fidelity image generation by reinterpreting AR prediction entropy as a physical drifting field.

Efficiency Breakthrough arxiv | Mar 31

Meta-Harness automates the engineering of the 'code' surrounding LLMs, improving RAG and agent performance by optimizing retrieval and context management logic.

New Capability arxiv | Mar 31

ROVED reduces the expensive human feedback required for preference-based RL by up to 90% by leveraging vision-language embeddings and uncertainty filtering.

Efficiency Breakthrough arxiv | Mar 31

PhysNet embeds physical tumor growth dynamics directly into the latent feature space of a CNN, rather than just as a constraint on the output.

Paradigm Shift arxiv | Mar 31

This paper proves that reward hacking is a structural equilibrium of optimized AI agents, not a bug, and provides a computable 'distortion index' to predict it.

Paradigm Shift arxiv | Mar 31

Moves VLM grounding from text-based coordinates to a direct visual token selection mechanism via special pointing tokens.

Paradigm Shift arxiv | Mar 31

Introduces Heddle, a trajectory-centric system that resolves the long-tail latency bottleneck of tool calls in agentic Reinforcement Learning.

Efficiency Breakthrough arxiv | Mar 31

Bypasses expensive formal verification solvers by designing neural networks that are 'verifiable by design' using the fast trivial Lipschitz bound.

Paradigm Shift arxiv | Mar 31

A training-free metacognitive framework that gives LLMs explicit control over expanding, pruning, and repairing reasoning trajectories during inference.

New Capability arxiv | Mar 31

Presents PReD, the first foundation model and 1.3M-sample dataset specifically for electromagnetic signal perception and decision-making.

New Capability arxiv | Mar 31

Replaces traditional fixed-update rules in online learning with a causal Transformer to track switching experts in non-stationary environments.

Paradigm Shift arxiv | Mar 31

Replaces the classic Newton-Raphson power-flow solver with a differentiable GPU-accelerated simulation.

Efficiency Breakthrough arxiv | Mar 31

Transitions reasoning model optimization from coarse sequence-level advantages to fine-grained token dynamics.

New Capability arxiv | Mar 31

Moves beyond next-token prediction to model reasoning as gradient-based energy minimization over latent trajectories.

Paradigm Shift arxiv | Mar 31

Introduces lightweight equilibration to the Muon optimizer, significantly stabilizing and accelerating LLM pretraining.

Efficiency Breakthrough arxiv | Mar 31

Discovers that LLM hidden states undergo geometric 'warping' at digit-count boundaries, mimicking human psychological perception.

Scaling Insight arxiv | Mar 31

Enables instruction-following in low-resource languages by simply merging target language base models with English-instructed models.

Efficiency Breakthrough arxiv | Mar 31

Enhances Kolmogorov-Arnold Networks (KAN) with fractal interpolation to approximate non-smooth and rough functions.

New Capability arxiv | Mar 31

Exposes a massive robustness gap in Vision-Language-Action (VLA) models, where simple paraphrasing causes up to 50% success drops.

Breaks Assumption arxiv | Mar 31

An evolutionary framework for GPU kernel generation that outperforms frontier models like Claude 4.6 and Gemini 3.0.

Efficiency Breakthrough arxiv | Mar 31

HISA eliminates the quadratic O(L²) bottleneck in sparse attention indexers, enabling efficient long-context scaling for models like DeepSeek-V3.

Efficiency Breakthrough arxiv | Mar 31

Researchers have used LLMs to evolve entirely new Reinforcement Learning update rules from scratch that compete with human-designed baselines like PPO and SAC.

New Capability arxiv | Mar 31

The 'Scaffold Effect' reveals that Vision-Language Models in clinical settings often fabricate reasoning based on prompt framing rather than actual visual data.

Breaks Assumption arxiv | Mar 31

Entropic Claim Resolution (ECR) shifts RAG from retrieving 'relevant' documents to retrieving 'discriminative' evidence that minimizes hypothesis uncertainty.

Paradigm Shift arxiv | Mar 31

IsoQuant leverages SO(4) isoclinic rotations to achieve a 4.5x-4.7x speedup in low-bit KV-cache quantization over existing methods.

Efficiency Breakthrough arxiv | Mar 31

The 'Bidirectional Coherence Paradox' demonstrates that LLM performance and explanation quality can be inversely correlated depending on domain observability.

Paradigm Shift arxiv | Mar 31

COvolve creates an automated curriculum for open-ended learning by co-evolving environments and policies as executable code through a zero-sum game.

Paradigm Shift arxiv | Mar 31

INSID3 achieves state-of-the-art one-shot image segmentation using only frozen DINOv3 features without any training, fine-tuning, or auxiliary models.

Efficiency Breakthrough arxiv | Mar 31

EdgeDiT provides a hardware-aware blueprint for running massive Diffusion Transformers (DiT) on mobile NPUs with a 1.6x reduction in latency.

Efficiency Breakthrough arxiv | Mar 31

LAD achieves 3x lower latency than previous driving language models by generating textual reasoning and motion plans at up to 20 Hz.

Efficiency Breakthrough arxiv | Mar 31