AI & ML

1625 papers · Page 13 of 17

Spectral Edge Dynamics (SED) provides an early-warning signal for grokking, predicting generalization up to 1,700 steps before it occurs.

Scaling Insight arxiv | Mar 18

Transition Flow Matching learns a global transition flow rather than local velocity fields, enabling single-step generation and transfer to arbitrary future time points.

Paradigm Shift arxiv | Mar 18

Robot policy performance can be improved by up to 60% by identifying a single 'golden ticket' constant noise vector instead of sampling from a Gaussian.

Breaks Assumption arxiv | Mar 18

Simulation Distillation (SimDist) enables rapid sim-to-real adaptation by transferring reward and value models directly into a latent world model.

Paradigm Shift arxiv | Mar 18

Demonstrates that massive scaling of diverse simulator resets can replace manual curriculum engineering for complex dexterous manipulation.

Scaling Insight arxiv | Mar 18

Reduces high-quality 3D head avatar creation time from over 24 hours to 0.5 seconds per frame.

Efficiency Breakthrough arxiv | Mar 18

Reveals that models with identical predictive performance produce fundamentally different feature attributions based solely on their hypothesis class.

Breaks Assumption arxiv | Mar 18

Introduces a privacy-preserving ML framework that achieves strong non-invertibility without the utility loss of Differential Privacy or the cost of Homomorphic Encryption.

Paradigm Shift arxiv | Mar 18

Fuses categorical sampling into the LM-head matmul to eliminate logit materialization and speed up LLM decoding by up to 19%.

Efficiency Breakthrough arxiv | Mar 18

Analyses over 10,000 experiments to prove that LLM agents are capable of genuine architectural discovery rather than just hyperparameter tuning.

Paradigm Shift arxiv | Mar 18

Provides empirical evidence that structural sparsity in Vision Transformers does not lead to improved semantic interpretability.

Breaks Assumption arxiv | Mar 18

Demonstrates a complete AI-assisted mathematical research loop where a mathematician wrote zero lines of formal code to verify complex physics equilibria.

New Capability arxiv | Mar 18

Integrates LLM agents with the industry-standard Rosetta software to automate physics-based protein design for non-canonical amino acids.

New Capability arxiv | Mar 18

Releases 70B parameter models that operate entirely on bytes, effectively 'liberating' LLMs from static tokenizers.

Breaks Assumption arxiv | Mar 18

Derives closed-form power-law scaling for hyperparameters like learning rate and batch size using modern optimization theory rather than expensive empirical sweeps.

Scaling Insight arxiv | Mar 18

Introduces per-token adapter routing, allowing a single sequence to dynamically utilize multiple specialized LoRA experts.

Paradigm Shift arxiv | Mar 18

Provides the first formal proof that safety is non-compositional, meaning two individually safe AI agents can become hazardous when combined.

Breaks Assumption arxiv | Mar 18

Enables the prediction of an adapter's task, performance, and attributes directly from its LoRA weights without any inference or data access.

New Capability arxiv | Mar 18

Finds that filtering knowledge at 'write-time' (ingestion) maintains 100% RAG accuracy under noise levels where standard 'read-time' filtering completely collapses.

Paradigm Shift arxiv | Mar 18

Proposes a protocol that replaces complex multi-agent coding frameworks with a simple, interpretable filesystem structure.

Paradigm Shift arxiv | Mar 18

Establishes a duality between sequence-axis attention and depth-wise residual connections, treating layer depth as an ordered variable.

Paradigm Shift arxiv | Mar 18

Achieves microsecond-level kinodynamic motion planning for high-DOF robots by using differential flatness to solve boundary value problems analytically.

Efficiency Breakthrough arxiv | Mar 18

Introduces ARISE, a hierarchical reinforcement learning framework that allows LLMs to evolve and reuse a tiered library of reasoning skills rather than treating every math problem in isolation.

New Capability arxiv | Mar 18

Challenges the standard use of bilinear/bicubic interpolation for upsampling saliency maps, proving it creates spurious importance regions and proposing a mass-redistribution alternative.

Breaks Assumption arxiv | Mar 18

Demonstrates that masked diffusion language models can be 21.8x more compute-efficient than traditional autoregressive models when scaled correctly.

Efficiency Breakthrough arxiv | Mar 18

Proposes the Vision-Sound-Language-Action (VSLA) paradigm, enabling robots to respond to real-time environmental acoustics during task execution.

New Capability arxiv | Mar 18

Debunks the widely held 'intra-modal misalignment hypothesis' which claimed CLIP embeddings are inherently poor for image-only tasks.

Breaks Assumption arxiv | Mar 18

Introduces Helium, a serving framework that treats agentic workflows as data query plans to optimize redundant LLM calls and KV caches.

Efficiency Breakthrough arxiv | Mar 18

Presents ZipCal, a model-agnostic calibration data selection strategy for pruning and quantization that is 240x faster than model-based methods.

Efficiency Breakthrough arxiv | Mar 18

Proves that compositional generalization failure in neural networks is an architectural issue and provides a category-theoretic framework to fix it.

Paradigm Shift arxiv | Mar 18

Discovers that skipping learning rate decay during pre-training, while appearing worse for pre-train loss, significantly improves the model's adaptability during supervised fine-tuning (SFT).

Breaks Assumption arxiv | Mar 18

Proves that noisy/incorrect labels are destructive to Reinforcement Learning with Verifiable Rewards (RLVR), contradicting recent high-profile claims that noise doesn't matter.

Breaks Assumption arxiv | Mar 18

Successfully trains a 0.9B parameter pure Spiking Neural Network (SNN) from scratch for language modeling, achieving performance without Transformer distillation.

New Capability arxiv | Mar 18

Formulates Hierarchical Instruction Following as a Constrained Markov Decision Process to ensure LLMs prioritize system prompts over user instructions.

Paradigm Shift arxiv | Mar 18

Localizes reinforcement learning updates for code generation by using execution traces to identify the exact point of semantic failure.

New Capability arxiv | Mar 18

Challenges the standard 'pretrain-then-finetune' pipeline by showing that repeating domain-specific data during pretraining is significantly more effective.

Breaks Assumption arxiv | Mar 18

A rigorous multi-method audit revealing that frontier LLM performance on MMLU is significantly inflated by data contamination and memorization.

Breaks Assumption arxiv | Mar 18

Introduces modular, composable safety alignment via learnable control tokens rather than static parameter-level tuning.

Paradigm Shift arxiv | Mar 18

Uses an asymmetric Draft-Verify-Recover pipeline to enable high-quality personalized AI assistants without compromising user privacy.

New Capability arxiv | Mar 18

A self-supervised RLVR method that escapes the 'spurious majority' trap by using a temporary unlearning process for exploration.

New Capability arxiv | Mar 18

Decouples perceptual failures from logical errors in Vision-Language reward models to enable more reliable test-time scaling.

Paradigm Shift arxiv | Mar 18

Researchers identified a 'critique vector' in the latent space of Large Reasoning Models that can be steered to improve self-correction and test-time scaling.

Paradigm Shift arxiv | Mar 18

Omnilingual MT scales machine translation to over 1,600 languages, an 8x increase in coverage over previous state-of-the-art systems.

New Capability arxiv | Mar 18

This paper demonstrates precise behavioral steering of agentic traits in a 35B parameter MoE model using Sparse Autoencoder (SAE) decoded probe vectors.

New Capability arxiv | Mar 18

FederatedFactory solves the 'extreme non-IID' problem in Federated Learning by federating generative priors instead of model weights.

Paradigm Shift arxiv | Mar 18

Laya introduces the first EEG foundation model based on Joint Embedding Predictive Architecture (JEPA), outperforming traditional reconstruction-based models.

Paradigm Shift arxiv | Mar 18

Introduces a method to give frozen LLMs persistent memory in their continuous latent space, bypassing the need for text-level RAG or retraining.

New Capability arxiv | Mar 18

IndexRAG shifts cross-document reasoning from inference-time prompting to offline indexing by generating 'bridging facts' at index time.

Paradigm Shift arxiv | Mar 18

Provides a theoretical framework for why training AI on what to avoid (negative constraints) is structurally superior and more stable than training on preferences.

Paradigm Shift arxiv | Mar 18

VQKV uses Vector Quantization to achieve over 80% KV cache compression with almost zero loss in model performance.

Efficiency Breakthrough arxiv | Mar 18

Capability-Guided Compression uses Sparse Autoencoders (SAEs) to prevent 'capability loss' during model pruning and quantization.

New Capability arxiv | Mar 18

A causal analysis reveals that LLMs often ignore their own intermediate reasoning (Chain-of-Thought) when making final decisions.

Breaks Assumption arxiv | Mar 18

FEAT is a linear-complexity foundation model designed specifically for extremely large-scale structured (tabular) data.

Efficiency Breakthrough arxiv | Mar 18

Kamino is a massively parallel GPU physics solver that natively supports complex kinematic loops and multi-body systems.

Open Release arxiv | Mar 18

Detects and mitigates Vision-Language Model hallucinations at inference time by analyzing visual attention entropy rather than text outputs.

New Capability arxiv | Mar 18

Provides a geometric 'manifold envelopment' framework to explain why unsupervised RL for mathematical reasoning often collapses and how to stabilize it.

Scaling Insight arxiv | Mar 18

Formalizes AI agent governance as 'policies on paths,' moving from static prompts to runtime enforcement of complex legal and safety constraints.

Paradigm Shift arxiv | Mar 18

Enables stable 4-bit microscaling (MXFP4) quantization for Multi-modal LLMs, which previously suffered from performance collapse.

Efficiency Breakthrough arxiv | Mar 18

Introduces a way to train Reward Models that generate 'transferable rubrics'—explicit scoring criteria that improve performance across different tasks and models.

New Capability arxiv | Mar 18

OmniSONAR scales cross-lingual sentence embeddings to over 1,500 languages across text, speech, code, and math in a single semantic space.

New Capability arxiv | Mar 18

Aligns a base model to a target model's behavior by optimizing the 'data mixture' weights instead of using RLHF or DPO.

Paradigm Shift arxiv | Mar 18

Achieves high-bandwidth, precise Cartesian control of a fully soft continuum robot, breaking the assumption that softness and precision are incompatible.

Breaks Assumption arxiv | Mar 18

Fine-tuning language models on journal publication records allows them to match or exceed human experts in judging 'scientific taste'—the ability to identify which research ideas are worth pursuing.

New Capability arxiv | Mar 18

This paper introduces a Markov-based discrete reasoning model that learns its own stopping criterion and can re-mask and correct its own mistakes.

Paradigm Shift arxiv | Mar 18

Fast-WAM proves that World Action Models do not actually need to generate future 'imagination' frames at test-time to achieve state-of-the-art performance in embodied control.

Breaks Assumption arxiv | Mar 18

The study provides a formal link showing that internal 'world model' representations in transformers are a direct byproduct of the predictive geometry of the training data.

Scaling Insight arxiv | Mar 18

Chain-of-thought (CoT) reasoning in Vision-Language Models systematically degrades the reliability of uncertainty estimates, making models dangerously overconfident.

Breaks Assumption arxiv | Mar 18

Low-precision optimizer states cause 'state staleness' where updates round back to stored values, but scheduled resets can fully recover performance loss.

Efficiency Breakthrough arxiv | Mar 18

IQuest-Coder-V1 introduces a series of high-performance code models including a unique 'Loop' variant with a recurrent mechanism for efficiency.

Open Release arxiv | Mar 18

This method non-rigidly aligns inconsistent video diffusion frames into globally-consistent 3D pointclouds to enable high-quality environment reconstruction.

New Capability arxiv | Mar 18

Infrastructure-taught 3D perception uses static roadside sensors as unsupervised teachers for moving vehicles, eliminating the need for manual labels.

Paradigm Shift arxiv | Mar 18

pADAM is a unified generative framework that learns shared priors across heterogeneous multi-physics families (e.g., scalar diffusion to Navier-Stokes).

New Capability arxiv | Mar 18

The SOMP attack demonstrates that private training text can be reconstructed from shared gradients even at high batch sizes (up to B=128).

Breaks Assumption arxiv | Mar 18

TraceR1 uses a two-stage reinforcement learning framework to train multimodal agents to forecast entire trajectories before execution, rather than acting reactively.

Paradigm Shift arxiv | Mar 18

Zero-shot sim-to-real transfer for complex robotic manipulation is achievable using only synthetic simulated data at scale.

Breaks Assumption arxiv | Mar 18

Video models perform reasoning during the diffusion denoising steps rather than sequentially across video frames.

Paradigm Shift arxiv | Mar 18

Using the best-performing models as anchors for 'LLM-as-a-judge' evaluations significantly reduces the reliability of human ranking correlations.

Breaks Assumption arxiv | Mar 18

GIST achieves O(N) complexity for Graph Transformers while maintaining gauge invariance, enabling scaling to meshes with 750K nodes.

Efficiency Breakthrough arxiv | Mar 18

Intermittently resetting an agent to a fixed state significantly accelerates policy convergence in Reinforcement Learning.

Paradigm Shift arxiv | Mar 18

SOMA provides a unified, differentiable layer that bridges incompatible human body models like SMPL and SMPL-X in a single closed-form pass.

New Capability arxiv | Mar 18

LEAFE allows LLM agents to internalize feedback as actionable experience, enabling them to backtrack and recover from failures autonomously.

New Capability arxiv | Mar 18

Pretrained 3D generative models can be repurposed for high-quality part segmentation using less than 1% of the typical labeled data.

Efficiency Breakthrough arxiv | Mar 18

Neural PDE solvers are not learning general operators, but rather a family of solutions specifically indexed to the boundary conditions seen during training.

Breaks Assumption arxiv | Mar 18

DreamPlan fine-tunes Vision-Language planners entirely within the 'imagination' of a video world model, bypassing costly physical robot trials.

Paradigm Shift arxiv | Mar 18

SurgΣ is a massive open-source release of 5.98M multimodal conversations and foundation models for surgical intelligence.

Open Release arxiv | Mar 18

Turns out the math for how things cool down or rot works just fine even if time doesn't move forward.

Paradigm Challenge arxiv | Mar 17

Our computers are way slower than they should be because they're hardwired to think time only goes one way.

Practical Magic arxiv | Mar 17

Your satellite internet doesn't actually care about clouds—it’s just the hidden liquid water inside them that’s killing your signal.

Nature Is Weird arxiv | Mar 17

If someone hacks a self-driving car, the way it steers leaves a 'fingerprint' that's so weird the car can actually tell it's being hijacked.

Practical Magic arxiv | Mar 17

An AI just started cracking math problems about the laws of physics that have basically been bullying scientists for centuries.

Paradigm Challenge arxiv | Mar 17

There’s this 'impossible' crystal structure that lets you squeeze data down as small as you want without it ever breaking.

Nature Is Weird arxiv | Mar 17

There's this one weird number—the natural log of 3—that basically decides if a group will work together or descend into total chaos.

Nature Is Weird arxiv | Mar 17

When vanilla prices skyrocketed, farmers in Madagascar actually cleared *more* forest, killing the idea that getting richer helps the environment.

Nature Is Weird arxiv | Mar 17

The main tool we use to decide if science is 'true' was actually just a lazy shortcut invented to deal with all the new scientists after WWII.

Paradigm Challenge arxiv | Mar 17

Diffusion LLMs can match autoregressive (AR) reasoning performance by using AR-generated plans as globally visible scaffolds.

Paradigm Shift arxiv | Mar 17

Researchers identified just three specific attention heads that govern persona and style, enabling precise steering without degrading model coherence.

Breaks Assumption arxiv | Mar 17

Factual selection in LLMs is driven by rotational dynamics on a hypersphere rather than scalar magnitude shifts, with the behavior emerging suddenly at the 1.6B parameter mark.

Scaling Insight arxiv | Mar 17

The Spherical Kernel Operator (SKO) replaces dot-product attention with ultraspherical polynomials to bypass the saturation phenomenon that bottlenecks world models.

Paradigm Shift arxiv | Mar 17

Truncated-Reasoning Self-Distillation (TRSD) allows models to maintain accuracy even when their chain-of-thought traces are heavily shortened.

Efficiency Breakthrough arxiv | Mar 17

Sparse Autoencoders (SAEs) can be used to build retrieval models that outperform traditional vocabulary-based sparse retrieval in multilingual settings.

Paradigm Shift arxiv | Mar 17