AI & ML

1625 papers · Page 10 of 17

Open-sources a high-fidelity foundation model that jointly generates synchronized video and audio using a unified single-stream Transformer.

Open Release arxiv | Mar 24

Introduces a learnable bridge between GELU and ReLU activations to enable deployment-friendly piecewise-linear networks.

Efficiency Breakthrough arxiv | Mar 24

Achieves a 75x parameter reduction in 3D medical image segmentation by hybridizing Mamba and Transformer modules.

Efficiency Breakthrough arxiv | Mar 24

Decouples high-level reasoning from low-level motor control in robotics using a visual prompting interface.

Paradigm Shift arxiv | Mar 24

Releases the first large-scale family of learned sparse retrieval (LSR) models specialized for code (up to 8B parameters).

Open Release arxiv | Mar 24

Introduces a streaming detection head that stops Large Reasoning Models (LRMs) from 'overthinking' redundant steps.

Efficiency Breakthrough arxiv | Mar 24

Proposed a test-time scaling paradigm for image restoration that allows compute-to-quality trade-offs during inference.

Paradigm Shift arxiv | Mar 24

Releases the hardware design and training environment for MEVIUS2, an open-source, Spot-scale quadruped robot.

Open Release arxiv | Mar 24

Proves that 'topic-matched' contrast pairs are ineffective for extracting refusal directions in LLM abliteration research.

Breaks Assumption arxiv | Mar 24

Provides a strictly controlled comparison of autoregressive vs. masked diffusion language models on identical compute budgets.

Scaling Insight arxiv | Mar 24

Ensures safe Vision-Language Model generation without over-refusal by steering activations within the null-space of benign inputs.

New Capability arxiv | Mar 24

Identifies that the direction of log-probability change is more critical than magnitude for improving LLM reasoning via RL.

Paradigm Shift arxiv | Mar 24

Integrates LLMs as closed-loop tuning experts for manufacturing robots to achieve 0% failure in complex 3D printing tasks.

New Capability arxiv | Mar 24

Reduces the token count of Stable Diffusion 3.5 by 4x for high-resolution generation with minimal fine-tuning.

Efficiency Breakthrough arxiv | Mar 24

Provides causal evidence that LLMs use internal confidence signals to drive behavioral decisions like abstention, rather than just as a side-effect of output generation.

Breaks Assumption arxiv | Mar 24

Identifies 'Visual Anchor Collapse' in DPO-aligned VLMs and introduces an asymmetric constraint to prevent models from ignoring visual evidence in favor of language priors.

Paradigm Shift arxiv | Mar 24

A predictive scheduling system for multi-agent workflows that optimizes serving across heterogeneous LLM clusters (mixing large and small models).

Efficiency Breakthrough arxiv | Mar 24

Introduces 'Noise Titration' to prove that current time-series foundation models often fail at structural inference, behaving instead as 'context parrots' during non-stationary shifts.

Breaks Assumption arxiv | Mar 24

Integrates auction bids and monetization logic directly into generative recommender systems (like TIGER) via bid-aware decoding.

New Capability arxiv | Mar 24

MemDLM embeds a simulated denoising process into training to create 'Parametric Memory,' narrowing the train-inference gap for Diffusion Language Models.

New Capability arxiv | Mar 24

An open foundation suite for universal dexterous robot control trained on over 50k trajectories across eight different robotic hand architectures.

Open Release arxiv | Mar 24

Bypasses Reinforcement Learning during the exploration phase by using uncertainty-guided tree search to discover informative data.

Paradigm Shift arxiv | Mar 24

Enables high-rank (r=384) DoRA training on single GPUs through factored norms and fused Triton kernels.

Efficiency Breakthrough arxiv | Mar 24

Introduces a parallel reasoning mechanism for Vision-Language-Action (VLA) models that eliminates the latency bottleneck of autoregressive Chain-of-Thought.

Efficiency Breakthrough arxiv | Mar 24

UNITE enables single-stage joint training of the tokenizer and the diffusion model from scratch, removing the need for frozen VAEs.

Paradigm Shift arxiv | Mar 24

A training-free feature caching framework that achieves 2.3x speedup for video world models while maintaining 99.4% quality.

Efficiency Breakthrough arxiv | Mar 24

A transformer-based meta-amortized framework that allows simulation-based inference to remain valid across different model structures without retraining.

New Capability arxiv | Mar 24

LassoFlexNet matches or beats leading tree-based models on tabular data while maintaining Lasso-like interpretability through per-feature embeddings and a group Lasso mechanism.

Paradigm Shift arxiv | Mar 24

Proves that rotation-invariant algorithms like standard Gradient Descent are fundamentally suboptimal for sparse targets when trained on hard labels.

Breaks Assumption arxiv | Mar 24

A grid-free probabilistic framework for nonrigid registration of high-dimensional vector-valued functions on irregular manifolds.

New Capability arxiv | Mar 24

A unified discrete diffusion framework that outperforms autoregressive models on large-scale discrete generation tasks for the first time.

Efficiency Breakthrough arxiv | Mar 24

The math we've used for 50 years to figure out how fast the internet should be is actually missing a giant piece of the puzzle.

Paradigm Challenge arxiv | Mar 23

You can get a whole crowd to agree on something even if everyone only knows what the person right next to them is thinking.

Nature Is Weird arxiv | Mar 23

Over 10% of new medical papers are being written by AI now—three years ago, that number was zero.

Nature Is Weird arxiv | Mar 23

We can now spot Alzheimer's early by looking at the brain like a building that’s literally buckling under the weight of toxic sludge.

Practical Magic arxiv | Mar 23

Massive wealth gaps might just be a math problem: if you always pick the better of two random options, inequality is basically guaranteed.

Nature Is Weird arxiv | Mar 23

Introduces a statistical alternative to the standard frequency-based BPE tokenization used in nearly all modern LLMs.

Paradigm Shift arxiv | Mar 23

Discovers a multiplicative scaling law governing how LLMs revise their beliefs during iterative reasoning (CoT, reflection).

Scaling Insight arxiv | Mar 23

Achieves state-of-the-art LLM distillation using 10-25% of the data required by standard fine-tuning.

Efficiency Breakthrough arxiv | Mar 23

Formally proves that a causal Transformer is mathematically equivalent to a stateless Differentiable Neural Computer.

Paradigm Shift arxiv | Mar 23

Accelerates MoE inference by speculating future experts to overlap CPU-GPU memory transfers with computation.

Efficiency Breakthrough arxiv | Mar 23

A self-improvement framework (MIPO) that improves LLM personalization and reasoning with zero additional data or human labels.

New Capability arxiv | Mar 23

Achieve 97% of Oracle reward performance using only 20% of the training labels for complex LLM reasoning.

Efficiency Breakthrough arxiv | Mar 23

The first Joint Embedding Predictive Architecture (JEPA) to train stably end-to-end from raw pixels with massive planning speedups.

Efficiency Breakthrough arxiv | Mar 23

Solves the compositional generalization failure of neural networks (0% to 100% accuracy) by embedding algebraic semiring constraints.

Paradigm Shift arxiv | Mar 23

A massive controlled study reveals that post-training algorithm rankings (DPO, SimPO, etc.) completely invert as models scale.

Scaling Insight arxiv | Mar 23

DAPA speeds up GELU computation by 16x and reduces hardware DSP utilization by 16x for on-device Transformer deployment.

Efficiency Breakthrough arxiv | Mar 23

Spectral Tempering achieves near-oracle embedding compression for dense retrieval without requiring any labeled data or grid searching.

Efficiency Breakthrough arxiv | Mar 23

Challenges the 80-year-old assumption that neurons must use weighted summation as their primary aggregation mechanism.

Paradigm Shift arxiv | Mar 23

Empirically proves that most Transformer layers are redundant, enabling a 54% training cost reduction through non-uniform budget allocation.

Efficiency Breakthrough arxiv | Mar 23

Warm-Start Flow Matching provides a guaranteed speedup for image/text generation by using lightweight models as initial priors.

Efficiency Breakthrough arxiv | Mar 23

VAMPO optimizes visual dynamics in video models using policy gradients to fix precision-critical errors in robotic manipulation.

New Capability arxiv | Mar 23

Debunks recent 'evaluation awareness' findings in LLMs by showing that linear probes are actually just tracking formatting artifacts.

Breaks Assumption arxiv | Mar 23

Introduces Hyperagents: self-referential systems where the meta-level modification logic is itself an editable program.

Paradigm Shift arxiv | Mar 23

Adaptive Layerwise Perturbation (ALP) solves the training-inference mismatch and importance ratio blowup in LLM reinforcement learning.

Efficiency Breakthrough arxiv | Mar 23

Fine-tunes Large Vision Language Models for medical tasks using only image-description pairs, bypassing the need for expensive expert-curated instructions.

Paradigm Shift arxiv | Mar 23

Introduces Any-Subgroup Equivariant Networks (ASEN), a single model that can adapt to multiple different symmetry groups via input modulation.

New Capability arxiv | Mar 23

ICLAD enables unified, in-context anomaly detection for tabular data across unsupervised, semi-supervised, and one-class regimes without weight updates.

New Capability arxiv | Mar 23

Expands formal reasoning beyond proof construction to the generation and formal verification of counterexamples in Lean 4.

New Capability arxiv | Mar 23

EvidenceRL uses reinforcement learning (GRPO) to explicitly optimize for evidence adherence, reducing hallucinations in high-stakes RAG pipelines.

Efficiency Breakthrough arxiv | Mar 23

MoCA3D predicts 3D bounding boxes from monocular images without requiring any camera intrinsics at inference time.

Breaks Assumption arxiv | Mar 23

Reveals that complex reasoning strategies like Chain-of-Thought (CoT) and Tree-of-Thought (ToT) provide negligible or even negative gains for text classification tasks.

Breaks Assumption arxiv | Mar 23

Formalizes the 'Neural Uncertainty Principle,' linking adversarial vulnerability in vision and hallucinations in LLMs to a shared geometric and information-theoretic origin.

Paradigm Shift arxiv | Mar 23

Accelerates diffusion-based image decoders by an order of magnitude using multi-scale sampling and one-step distillation.

Efficiency Breakthrough arxiv | Mar 23

CurveStream implements a curvature-aware hierarchical memory to handle streaming video in MLLMs without Out-of-Memory (OOM) errors.

New Capability arxiv | Mar 23

Proves the Key-Value (KV) cache is entirely redundant and can be bit-identically recomputed from the residual stream.

Breaks Assumption arxiv | Mar 23

Reduces covariance tracking error by 30x by reformulating the problem as rigid-body motion on Lie groups.

Efficiency Breakthrough arxiv | Mar 23

A massive field study (9,000+ users) proves that algorithmic shifts can reduce affective polarization without sacrificing user engagement.

Paradigm Shift arxiv | Mar 23

Achieves a 19x reduction in inference cost and 16x in latency for agentic workflows by evolving hybrid LLM-and-code pipelines.

Efficiency Breakthrough arxiv | Mar 23

Reduces long-context inference latency by 26.4x using a training-free, structure-aware prompt compression framework.

Efficiency Breakthrough arxiv | Mar 23

Boosts open-model agent performance on web navigation tasks from 6.4% to 43%, surpassing proprietary models like GPT-4o.

New Capability arxiv | Mar 23

Proves that intuitive task similarity is a poor predictor of training data value for MLLMs and offers a highly accurate training-free alternative.

Breaks Assumption arxiv | Mar 23

Enables zero-shot humanoid robot interaction by generating robot-centric 'dream' videos instead of relying on human-to-robot motion retargeting.

Paradigm Shift arxiv | Mar 23

Introduces the first reinforcement learning framework to compress implicit reasoning steps in looped language models.

Efficiency Breakthrough arxiv | Mar 23

Replaces fixed context compression ratios with a performance-floor constraint to ensure reliable LLM deployment.

Paradigm Shift arxiv | Mar 23

Achieves O(1) time complexity for dense component attribution in SwiGLU Transformers using a single forward-backward pass.

Efficiency Breakthrough arxiv | Mar 23

First unified pipeline to reconstruct complete geometry, materials, and lighting from sparse views in under one second.

New Capability arxiv | Mar 23

Introduces the first inherently scalable primitive for radiance fields, allowing real-time Level-of-Detail (LOD) rendering by simply truncating Fourier coefficients.

New Capability arxiv | Mar 23

FIPO overcomes reasoning length stagnation in LLMs by using Future-KL divergence to create dense rewards, extending Chain-of-Thought lengths to over 10,000 tokens.

Paradigm Shift arxiv | Mar 23

A training-free method to fix intra-modal misalignment in CLIP by decomposing projectors into an isotropic aligned subspace.

Efficiency Breakthrough arxiv | Mar 23

NASimJax provides a 100x throughput increase for autonomous penetration testing simulators by reimplementing the environment in JAX.

Efficiency Breakthrough arxiv | Mar 23

SCRL introduces the first negative supervision mechanism for Test-Time Reinforcement Learning, preventing LLMs from reinforcing 'consensus lies'.

New Capability arxiv | Mar 23

SAGE achieves state-of-the-art translation for low-resource languages while reducing training data requirements by 97.1% via RL-guided curation.

Efficiency Breakthrough arxiv | Mar 23

Memori reduces agent token costs by 20x by replacing raw conversation history with a persistent layer of semantic triples and summaries.

Efficiency Breakthrough arxiv | Mar 23

2K Retrofit enables 2K-resolution inference for any 3D geometric foundation model without modifying or retraining the backbone.

Efficiency Breakthrough arxiv | Mar 23

X-World is a controllable, action-conditioned multi-camera world model that simulates realistic future video observations for end-to-end driving.

New Capability arxiv | Mar 23

Breaking the 'capability ceiling' in LLM post-training by replacing full-history dependencies with explicit Markov states.

Paradigm Shift arxiv | Mar 23

A k-means variant that is up to 7x faster than FAISS and Scikit-Learn on CPUs and 4x faster than cuVS on GPUs.

Efficiency Breakthrough arxiv | Mar 23

Reduces the computational cost of Neural Architecture Search for ensembles from O(M) to O(1).

Efficiency Breakthrough arxiv | Mar 23

Enables LLMs to explore beyond their current distribution during RL by treating failed trajectories as hindsight guidance.

New Capability arxiv | Mar 23

Identifies 'critical times' in diffusion generation where targeted guidance pulses significantly improve image control.

Paradigm Shift arxiv | Mar 23

Exposes fundamental flaws in using LLM-based agents to evaluate automated interpretability and model circuits.

Breaks Assumption arxiv | Mar 23

Replaces unstable free-form recursive LLM code with a typed functional runtime grounded in lambda-calculus.

New Capability arxiv | Mar 23

Derives a variational ELBO for the Joint-Embedding Predictive Architecture (JEPA), unifying it with generative modeling.

Paradigm Shift arxiv | Mar 23

Enables zero-shot, directed protein generation by applying a simple scalar bias to stochastic attention samplers.

New Capability arxiv | Mar 23

Demonstrates that LLM reasoning capabilities drop sharply when tasks are framed within multi-turn dialogues vs isolated benchmarks.

Breaks Assumption arxiv | Mar 23

A comprehensive end-to-end workflow for humanoid loco-manipulation that standardizes sim-to-real transfer.

New Capability arxiv | Mar 23

Quantifies LLM uncertainty in a single generation pass without auxiliary models or repeated sampling.

Efficiency Breakthrough arxiv | Mar 23

Demonstrates that current 'faithfulness' metrics for Chain-of-Thought reasoning are highly subjective and vary wildly depending on the choice of classifier.

Breaks Assumption arxiv | Mar 23

Introduces a long-horizon video agent that uses 93% fewer frames than GPT-5/standalone LMMs while achieving higher accuracy.

Efficiency Breakthrough arxiv | Mar 23