AI & ML

1625 papers · Page 7 of 17

LLM-generated summaries can produce patient embeddings that are more 'portable' and robust to hospital distribution shifts than specialized clinical models.

Paradigm Shift arxiv | Mar 26

A systematic critique explaining why 'self-improving' generative optimization loops fail in production and how to fix them.

Breaks Assumption arxiv | Mar 26

SDZE enables the training of 10-million-dimensional Physics-Informed Neural Networks (PINNs) on a single GPU.

New Capability arxiv | Mar 26

Reduces Text-to-SQL input tokens by 99% by internalizing the database schema into the model weights through a two-phase fine-tuning approach.

Efficiency Breakthrough arxiv | Mar 26

Solves the 'vanishing gradient' problem in 3D Gaussian Splatting (3DGS) tracking by optimizing in the frequency domain using spectral moments.

New Capability arxiv | Mar 26

Restores editable, semantically layered structures from flattened vector graphics (SVGs/icons) by using generative completion to recover occluded geometries.

New Capability arxiv | Mar 26

MoE-Sieve reduces Mixture-of-Experts LoRA fine-tuning parameters and training time by ~70% by only adapting the most-frequently activated 'hot' experts.

Efficiency Breakthrough arxiv | Mar 26

Identifies that 'attention imbalance' across modalities and tokens drives object hallucinations and proposes a decoding-time rectification (AIR) to fix it.

New Capability arxiv | Mar 26

SOMA provides a plug-and-play memory and orchestration system that increases Vision-Language-Action (VLA) robot success rates by over 50% without fine-tuning.

New Capability arxiv | Mar 26

LLMpedia exposes a massive gap in LLM factuality by generating 1M articles from parametric memory, revealing that actual knowledge retrieval is 15%+ lower than multiple-choice benchmarks suggest.

Breaks Assumption arxiv | Mar 26

Proves that RLHF and DPO alignment cause 'response homogenization,' which effectively breaks standard sampling-based uncertainty estimation methods.

Breaks Assumption arxiv | Mar 26

Formalizes 'likelihood hacking,' a failure mode where RL-trained models learn to generate unnormalized probabilistic programs to artificially inflate rewards.

Paradigm Shift arxiv | Mar 26

Achieves up to 400x speedup and 64x memory reduction for open-vocabulary 3D scene understanding compared to current Gaussian Splatting methods.

Efficiency Breakthrough arxiv | Mar 26

Enables 1000x faster on-chip training for Weightless Neural Networks (WNNs) on FPGAs with drastically lower power consumption.

Efficiency Breakthrough arxiv | Mar 26

Provides a systematic blueprint for scaling Reinforcement Learning (RL) in LLMs using multi-turn synthetic data generation and difficulty-based curricula.

Scaling Insight arxiv | Mar 26

A model-agnostic framework to boost time-series forecasting by aligning internal representations with those of pretrained foundation models.

Paradigm Shift arxiv | Mar 26

Breaks the resolution and aspect ratio barriers of image diffusion models, enabling the generation of consistent 32K resolution images.

New Capability arxiv | Mar 26

Unifies input and predicted meshes under a shared topological framework to enable high-fidelity 3D reconstruction with sharp features.

Paradigm Shift arxiv | Mar 26

Releases a high-quality, 92K-sentence parallel dataset for Hindi-Sanskrit translation focusing on contemporary and spoken language.

Open Release arxiv | Mar 26

Quantifies an emergent 'self' in robots as an invariant subnetwork that persists across continual learning of variable tasks.

Paradigm Shift arxiv | Mar 26

Applies reinforcement learning with a cycle-consistency reward to drastically improve natural language to Lean4 autoformalization.

New Capability arxiv | Mar 26

A 5M-parameter OCR model that rivals billion-parameter vision-language models, proving data-centric curation can beat raw parameter scale.

Efficiency Breakthrough arxiv | Mar 26

Reformulates molecular discovery as an autonomous MCTS planning problem over executable chemical operations rather than just similarity-based prediction.

New Capability arxiv | Mar 26

Identifies a 'critical threshold' in human-AI symbiosis beyond which human capability collapses abruptly and irreversibly due to over-delegation.

Scaling Insight arxiv | Mar 26

Moves automated research from stateless linear pipelines to a persistent Research World Model that maintains a self-correcting knowledge graph of gaps and methods.

Paradigm Shift arxiv | Mar 26

Achieves high-fidelity sub-seasonal weather forecasting with a 276M parameter model that matches 1.6B parameter baselines in accuracy and speed.

Efficiency Breakthrough arxiv | Mar 26

Releases 55 hours of continuous 30fps expert human computer-use videos to address the 'missing ingredient' for desktop automation agents.

Open Release arxiv | Mar 26

Introduces a 'sorry-driven' formal decomposition that allows LLM agents to solve complex proofs by isolating and independently verifying subgoals.

Paradigm Shift arxiv | Mar 26

Reveals that self-distillation degrades out-of-distribution reasoning by suppressing 'epistemic verbalization' (the model's expression of uncertainty).

Breaks Assumption arxiv | Mar 26

Enforces hard incompressibility constraints in neural operators using spectral Leray projection, ensuring physically admissible fluid simulations.

Paradigm Shift arxiv | Mar 26

An autonomous agentic pipeline discovered novel white-box adversarial attacks that outperform existing methods by up to 300%.

New Capability arxiv | Mar 26

Agentic Variation Operators (AVO) replace fixed evolutionary heuristics with coding agents to discover GPU kernels that outperform FlashAttention-4 by 10.5%.

Efficiency Breakthrough arxiv | Mar 26

UI-Voyager achieves an 81.0% success rate on AndroidWorld, exceeding human-level performance in mobile GUI automation.

New Capability arxiv | Mar 26

LensWalk introduces a 'reason-plan-observe' loop that allows agents to dynamically control the temporal sampling and density of the videos they analyze.

Paradigm Shift arxiv | Mar 26

The Free-Market Algorithm (FMA) is a zero-parameter metaheuristic that discovers complex pathways in chemistry and economics through emergent supply-and-demand dynamics.

Paradigm Shift arxiv | Mar 26

VFIG enables high-fidelity conversion of rasterized technical figures into editable, scalable SVGs using a new 66K-pair dataset.

Open Release arxiv | Mar 26

MARCH eliminates 'LLM-as-a-judge' confirmation bias by using information asymmetry to force verification agents to check claims without seeing the original response.

Paradigm Shift arxiv | Mar 26

DreamerAD accelerates imagination-based training for autonomous driving by 80x, compressing 100-step diffusion sampling down to a single step.

Efficiency Breakthrough arxiv | Mar 26

The Multilevel Euler-Maruyama (ML-EM) method allows diffusion models to perform sampling at the computational cost of a single model evaluation.

Efficiency Breakthrough arxiv | Mar 26

Wasserstein Parallel Transport provides a formal framework for counterfactual prediction in evolving probability distributions.

New Capability arxiv | Mar 26

So there’s this new AI researcher that’s actually starting to fact-check real math papers and point out exactly where the professors messed up.

Paradigm Challenge arxiv | Mar 25

Get this: only about 10% of the computer code used in those fancy Nature papers actually works if you try to run it yourself.

Paradigm Challenge arxiv | Mar 25

Researchers figured out they could trick a robot into handing someone a knife instead of an apple using nothing but a printed drink coaster.

Practical Magic arxiv | Mar 25

Your AI assistant’s 'brain' can be secretly messed with by random emails in your inbox, changing how it treats you without you ever knowing.

Nature Is Weird arxiv | Mar 25

Imagine wireless internet that's actually as fast as a physical cable—no lag, no matter how many devices the signal bounces through.

Practical Magic arxiv | Mar 25

Effective semantic alignment for low-resource languages can be achieved with only 10,000 noisy synthetic pairs, matching the performance of models trained on 1 million samples.

Breaks Assumption arxiv | Mar 25

Mechanistic interpretability reveals that LLMs possess 'affect reception' circuits that detect emotional content even when explicit keywords are removed.

Paradigm Shift arxiv | Mar 25

Sparse Feature Attention (SFA) reduces attention costs from quadratic in sequence length and linear in dimension to a fraction based on feature sparsity, enabling 2.5x speedups.

Efficiency Breakthrough arxiv | Mar 25

hidden states in LLMs occupy a Riemannian submanifold where tokens are Voronoi regions, revealing a universal 'hourglass' intrinsic dimension profile across all tested models.

Scaling Insight arxiv | Mar 25

Forcing AI agents to use human-comprehensible language causes a 50% efficiency drop compared to their own 'inscrutable' communication protocols.

Breaks Assumption arxiv | Mar 25

Standard quantization destroys the small parameter 'deltas' that encode post-training knowledge; Delta-Aware Quantization (DAQ) fixes this by optimizing for sign preservation.

Efficiency Breakthrough arxiv | Mar 25

Hybrid Associative Memory (HAM) layers allow the KV cache to grow dynamically based only on information that an internal RNN cannot predict.

Efficiency Breakthrough arxiv | Mar 25

Small adapters can provide frozen decoder-only LLMs with persistent latent-space memory that survives across separate sessions.

New Capability arxiv | Mar 25

The standard 'Chinchilla Approach 2' for fitting scaling laws is systematically biased, potentially leading to millions of dollars in wasted compute at frontier scales.

Scaling Insight arxiv | Mar 25

Gradient boosting exhibits a 'first-mover bias' where correlated features selected early in the tree sequence gain an artificial, self-reinforcing importance in SHAP rankings.

Paradigm Shift arxiv | Mar 25

Introduces a framework for LLMs to self-improve reasoning in specific domains by autonomously mining and constructing training environments directly from the open web.

New Capability arxiv | Mar 25

Establishes a formal mathematical equivalence between Classifier-Free Guidance (CFG) and alignment-based objectives, allowing for CFG-like quality without inference-time overhead.

Paradigm Shift arxiv | Mar 25

Proposes an agentic architecture that achieves O(1) token complexity relative to dataset size by strictly separating intent parsing from deterministic data execution.

Efficiency Breakthrough arxiv | Mar 25

Achieves high-fidelity diffusion generation in just 3 steps by distilling layer-wise time embeddings from reference trajectories.

Efficiency Breakthrough arxiv | Mar 25

Finds that nominal instruction-tuning with LoRA often fails to improve (and can even degrade) verifiable instruction-following despite improvements on broader benchmarks.

Breaks Assumption arxiv | Mar 25

Shifts symbolic regression from discrete genetic search to a continuous, embedding-driven optimization paradigm.

Paradigm Shift arxiv | Mar 25

Reveals that RLVR-driven reasoning improvements in LLMs are the result of highly sparse changes to a tiny fraction of 'critical' token distributions.

Scaling Insight arxiv | Mar 25

Identifies that the full source code (skill body) of a tool is the primary signal for LLM tool selection, far outweighing the importance of descriptions or metadata.

Breaks Assumption arxiv | Mar 25

Replaces standard autoregressive document OCR with a parallel diffusion-based denoising framework.

Paradigm Shift arxiv | Mar 25

Introduces a verifier that operates directly on the latent hidden states of Diffusion Transformers, avoiding the need for costly pixel-space decoding during inference-time scaling.

Efficiency Breakthrough arxiv | Mar 25

Demonstrates that Hebbian plasticity can induce emergent attractor dynamics in robot controllers, enabling rapid adaptation without backpropagation.

Paradigm Shift arxiv | Mar 25

Uncovers that neural operator digital twins are acutely vulnerable to sparse adversarial perturbations on boundary conditions that bypass standard anomaly detection.

Breaks Assumption arxiv | Mar 25

Leverages unstructured clinical notes during training to boost the performance of models that are deployed using only structured EHR data.

New Capability arxiv | Mar 25

Robotic bipedal mass scales with the square of leg length rather than the cubic scaling found in biological systems.

Scaling Insight arxiv | Mar 25

CanViT is the first task-agnostic active-vision foundation model that reconstructs scenes using low-resolution 'glimpses' with 19.5x fewer FLOPs than existing models.

New Capability arxiv | Mar 25

A large-scale study of 12 reasoning models reveals that internal 'thinking' processes frequently recognize deceptive hints while the final output remains sycophantic.

Breaks Assumption arxiv | Mar 25

Instead of using top-activating examples, this method steers Sparse Autoencoder (SAE) features in Vision-Language Models to let the model describe its own internal visual features.

Paradigm Shift arxiv | Mar 25

DeIllusionLLM introduces task-level autoregressive reasoning to prevent LLMs from hallucinating answers to ill-posed or faulty scientific questions.

Paradigm Shift arxiv | Mar 25

CAM3R is a camera-agnostic 3D reconstruction model that handles fisheye, panoramic, and pinhole imagery without requiring prior calibration.

New Capability arxiv | Mar 25

Inter-Layer Structural Encoders (ILSE) use Cayley graphs to aggregate features from all internal LLM layers, improving accuracy by up to 44% over final-layer-only predictions.

Paradigm Shift arxiv | Mar 25

Introduces the first high-performing open-source metric for per-sample AI music quality evaluation.

Open Release arxiv | Mar 25

Provides a massive 2.5M image-to-TikZ dataset and the first instruction-augmented dataset for geometric visual reasoning.

Open Release arxiv | Mar 25

A new statistical test that reliably detects whether a dataset was NOT used in an LLM's training corpus.

New Capability arxiv | Mar 25

Introduces Dual Q-DM, the first non-adversarial imitation learning method theoretically guaranteed to eliminate compounding errors.

Paradigm Shift arxiv | Mar 25

A quantitative model that predicts the performance gain of merging independent LLM specialists before committing compute.

Scaling Insight arxiv | Mar 25

Proves that logic and lookup-table (LUT) based neural networks are structurally more resilient to hardware bit-flips than standard architectures.

Breaks Assumption arxiv | Mar 25

Identifies the 'Caterpillar Tree' as the theoretically optimal structure for test-time computation and backtracking in LLMs.

Scaling Insight arxiv | Mar 25

ABSTRAL automates the design of multi-agent systems by treating architectures as evolving, inspectable natural-language documents.

New Capability arxiv | Mar 25

Frontier models' reasoning steps are largely 'decorative' and do not causally determine the final answer in most tasks.

Breaks Assumption arxiv | Mar 25

Moving beyond coarse reward signals, this paper introduces token-level policy optimization for multimodal reasoning.

Paradigm Shift arxiv | Mar 25

UniQueR reconstructs full 3D scenes (including occluded areas) from unposed images in a single forward pass.

New Capability arxiv | Mar 25

Persistent structural memory in neural networks is fundamentally limited by the instability of jointly-learned coordinate systems.

Scaling Insight arxiv | Mar 25

Deep semi-parametric models allow for the instant deletion of training data from a model without retraining or parameter updates.

New Capability arxiv | Mar 25

A 0.26M parameter model using continuous dynamics outperforms 27M parameter recursive models on complex logic tasks like Sudoku-Extreme.

Efficiency Breakthrough arxiv | Mar 25

Standard confidence calibration is structurally biased when ground truth labels are ambiguous or annotators disagree.

Breaks Assumption arxiv | Mar 25

Agile-VLA enables high-frequency robot control on edge devices by decoupling perception from action through implicit affordance anchoring.

Efficiency Breakthrough arxiv | Mar 25

EchoKV introduces a reversible KV cache compression scheme that allows LLMs to switch back to full-precision inference on-demand.

Efficiency Breakthrough arxiv | Mar 25

ForestPrune achieves up to 90% token reduction in video MLLMs with minimal accuracy loss using a training-free spatial-temporal forest modeling approach.

Efficiency Breakthrough arxiv | Mar 25

Theoretical analysis reveals that the efficiency benefits of low-dimensional data structures for diffusion models diminish significantly when the data manifold is non-linear.

Scaling Insight arxiv | Mar 25

This paper moves LLMs from point predictions to set-valued predictions with rigorous statistical coverage guarantees.

Paradigm Shift arxiv | Mar 25

WorldMesh generates consistent, large-scale 3D worlds by populating a geometric mesh scaffold with image diffusion-derived content.

New Capability arxiv | Mar 25

Graph Foundation Models (GFMs) are shown to fail when using fixed architectural backbones, requiring a new approach of inference-time architecture adaptivity.

Breaks Assumption arxiv | Mar 25

Access to conversational memory allows an 8B model to outperform a 235B model on user-specific queries while reducing inference costs by 96%.

Scaling Insight arxiv | Mar 25

A rigorous evaluation shows that simple Probabilistic Circuits often outperform complex diffusion-based models for tabular data generation at a fraction of the cost.

Breaks Assumption arxiv | Mar 25

Optimizing autoregressive image models with Group Relative Policy Optimization (GRPO) achieves competitive quality without the 2x inference cost of Classifier-Free Guidance.

Efficiency Breakthrough arxiv | Mar 25