EFFICIENCY_BREAKTHROUGH

EFFICIENCY_BREAKTHROUGH

375 papers · Page 3 of 4

Prompt Replay speeds up GRPO training by selectively reusing 'medium difficulty' prompts to maximize learning signal in RL rollouts.

AI & ML arxiv | Mar 24

Breaks the massive compute barrier for medium-range weather forecasting, training on a single consumer-grade GPU.

AI & ML arxiv | Mar 24

An autonomous agent loop that optimizes GPU kernels to outperform human-expert and compiler-generated baselines.

AI & ML arxiv | Mar 24

Introduces AgentHER, a framework that salvages 'failed' agent trajectories by relabeling them as successful demonstrations for alternative goals.

AI & ML arxiv | Mar 24

TIDE is a post-training early-exit system that allows individual tokens to skip unnecessary layers, improving throughput by up to 8% with minimal calibration.

AI & ML arxiv | Mar 24

PivotRL identifies 'pivot' turns in agent trajectories where actions matter most, enabling compute-efficient reinforcement learning that matches end-to-end RL at 4x lower cost.

AI & ML arxiv | Mar 24

KG-Hopper enables 7B-parameter models to outperform 70B systems on complex Knowledge Graph reasoning by embedding the entire multi-hop process into a single 'thinking' stage.

AI & ML arxiv | Mar 24

Achieves state-of-the-art open-vocabulary segmentation using a training-free, purely geometric projection and propagation method.

AI & ML arxiv | Mar 24

Enables merging independently trained specialist models (e.g., Vision-LLM and Audio-LLM) into a single multimodal model without any paired training data.

AI & ML arxiv | Mar 24

SparseVoxelDet is the first fully sparse object detector for event cameras that never instantiates a dense tensor, achieving 858x GPU memory compression.

AI & ML arxiv | Mar 24

Confidence-Evidence Bayesian Gain (CEBaG) provides deterministic hallucination detection for medical VQA without requiring 10-20 stochastic generations.

AI & ML arxiv | Mar 24

Enables high-performance Zeroth-Order (ZO) fine-tuning of LLMs by leveraging online curvature signals.

AI & ML arxiv | Mar 24

Reduces token consumption in interleaved multimodal reasoning by over 72% using dynamic visual thoughts.

AI & ML arxiv | Mar 24

Eliminates the need for strictly aligned image pairs in infrared and visible image fusion.

AI & ML arxiv | Mar 24

Reduces human annotation requirements for NLP model testing by up to 95%.

AI & ML arxiv | Mar 24

Achieves a 50x reduction in visual tokens for Video-LLMs while preserving over 90% of baseline performance.

AI & ML arxiv | Mar 24

Introduces a learnable bridge between GELU and ReLU activations to enable deployment-friendly piecewise-linear networks.

AI & ML arxiv | Mar 24

Achieves a 75x parameter reduction in 3D medical image segmentation by hybridizing Mamba and Transformer modules.

AI & ML arxiv | Mar 24

Introduces a streaming detection head that stops Large Reasoning Models (LRMs) from 'overthinking' redundant steps.

AI & ML arxiv | Mar 24

Reduces the token count of Stable Diffusion 3.5 by 4x for high-resolution generation with minimal fine-tuning.

AI & ML arxiv | Mar 24

A predictive scheduling system for multi-agent workflows that optimizes serving across heterogeneous LLM clusters (mixing large and small models).

AI & ML arxiv | Mar 24

Enables high-rank (r=384) DoRA training on single GPUs through factored norms and fused Triton kernels.

AI & ML arxiv | Mar 24

Introduces a parallel reasoning mechanism for Vision-Language-Action (VLA) models that eliminates the latency bottleneck of autoregressive Chain-of-Thought.

AI & ML arxiv | Mar 24

A training-free feature caching framework that achieves 2.3x speedup for video world models while maintaining 99.4% quality.

AI & ML arxiv | Mar 24

A unified discrete diffusion framework that outperforms autoregressive models on large-scale discrete generation tasks for the first time.

AI & ML arxiv | Mar 24

Sparse Feature Attention (SFA) reduces attention costs from quadratic in sequence length and linear in dimension to a fraction based on feature sparsity, enabling 2.5x speedups.

AI & ML arxiv | Mar 25

Standard quantization destroys the small parameter 'deltas' that encode post-training knowledge; Delta-Aware Quantization (DAQ) fixes this by optimizing for sign preservation.

AI & ML arxiv | Mar 25

Hybrid Associative Memory (HAM) layers allow the KV cache to grow dynamically based only on information that an internal RNN cannot predict.

AI & ML arxiv | Mar 25

Proposes an agentic architecture that achieves O(1) token complexity relative to dataset size by strictly separating intent parsing from deterministic data execution.

AI & ML arxiv | Mar 25

Achieves high-fidelity diffusion generation in just 3 steps by distilling layer-wise time embeddings from reference trajectories.

AI & ML arxiv | Mar 25

Introduces a verifier that operates directly on the latent hidden states of Diffusion Transformers, avoiding the need for costly pixel-space decoding during inference-time scaling.

AI & ML arxiv | Mar 25

A 0.26M parameter model using continuous dynamics outperforms 27M parameter recursive models on complex logic tasks like Sudoku-Extreme.

AI & ML arxiv | Mar 25

Agile-VLA enables high-frequency robot control on edge devices by decoupling perception from action through implicit affordance anchoring.

AI & ML arxiv | Mar 25

EchoKV introduces a reversible KV cache compression scheme that allows LLMs to switch back to full-precision inference on-demand.

AI & ML arxiv | Mar 25

ForestPrune achieves up to 90% token reduction in video MLLMs with minimal accuracy loss using a training-free spatial-temporal forest modeling approach.

AI & ML arxiv | Mar 25

Optimizing autoregressive image models with Group Relative Policy Optimization (GRPO) achieves competitive quality without the 2x inference cost of Classifier-Free Guidance.

AI & ML arxiv | Mar 25

DILLO enables 14x faster safety-critical agent steering by predicting action consequences from latent states instead of heavy visual simulations.

AI & ML arxiv | Mar 25

ImplicitRM enables unbiased reward modeling from 'messy' implicit feedback (clicks/copies), drastically reducing the cost of RLHF data collection.

AI & ML arxiv | Mar 25

Introduces custom CUDA kernels and a sparse packing format that enables Transformers to maintain performance with over 99% feedforward sparsity.

AI & ML arxiv | Mar 25

Upgrades video Diffusion Transformers to ultra-high-resolution synthesis using a two-stage 'Relay LoRA' adaptation on pure images.

AI & ML arxiv | Mar 25

Challenges the dominance of on-policy RL for LLMs by introducing a practical off-policy value-based framework that enables data reuse.

AI & ML arxiv | Mar 25

An online length-aware scheduling strategy that eliminates training 'bubbles' during the rollout phase of LLM reinforcement learning.

AI & ML arxiv | Mar 25

Leverages human gaze tracking to assign non-uniform token density in diffusion models, creating perceptually perfect images with significantly less compute.

AI & ML arxiv | Mar 25

Replaces visual token compression with sparse, dynamically selected vision-language interactions in VLLMs.

AI & ML arxiv | Mar 25

Introduces on-the-fly quantization that calibrates to individual prompts during inference, solving the 'domain shift' problem where standard quantization fails on unseen data.

AI & ML arxiv | Mar 25

Memory Sparse Attention (MSA) enables LLMs to scale to 100 million tokens with linear complexity and less than 9% precision degradation.

AI & ML arxiv | Mar 26

The first sorting-free stochastic formulation for 3D Gaussian Splatting that matches rasterization speed while enabling full ray-traced effects.

AI & ML arxiv | Mar 26

AI agent benchmarks can be slashed by ~50% in cost by only evaluating on tasks with intermediate historical pass rates.

AI & ML arxiv | Mar 26

Hybrid Distillation Policy Optimization (HDPO) overcomes the 'vanishing gradient' problem for hard mathematical prompts that RL agents cannot solve.

AI & ML arxiv | Mar 26

A self-distillation method for Multi-Token Prediction (MTP) that yields a 220% inference speedup with minimal training cost.

AI & ML arxiv | Mar 26

AttentionPack achieves up to 8x memory efficiency during decoding for large vision-language models (VLMs).

AI & ML arxiv | Mar 26

SLAT-Phys predicts spatially varying material property fields directly from single RGB images with a 120x speedup.

AI & ML arxiv | Mar 26

Reduces Text-to-SQL input tokens by 99% by internalizing the database schema into the model weights through a two-phase fine-tuning approach.

AI & ML arxiv | Mar 26

MoE-Sieve reduces Mixture-of-Experts LoRA fine-tuning parameters and training time by ~70% by only adapting the most-frequently activated 'hot' experts.

AI & ML arxiv | Mar 26

Achieves up to 400x speedup and 64x memory reduction for open-vocabulary 3D scene understanding compared to current Gaussian Splatting methods.

AI & ML arxiv | Mar 26

Enables 1000x faster on-chip training for Weightless Neural Networks (WNNs) on FPGAs with drastically lower power consumption.

AI & ML arxiv | Mar 26

A 5M-parameter OCR model that rivals billion-parameter vision-language models, proving data-centric curation can beat raw parameter scale.

AI & ML arxiv | Mar 26

Achieves high-fidelity sub-seasonal weather forecasting with a 276M parameter model that matches 1.6B parameter baselines in accuracy and speed.

AI & ML arxiv | Mar 26

Agentic Variation Operators (AVO) replace fixed evolutionary heuristics with coding agents to discover GPU kernels that outperform FlashAttention-4 by 10.5%.

AI & ML arxiv | Mar 26

DreamerAD accelerates imagination-based training for autonomous driving by 80x, compressing 100-step diffusion sampling down to a single step.

AI & ML arxiv | Mar 26

The Multilevel Euler-Maruyama (ML-EM) method allows diffusion models to perform sampling at the computational cost of a single model evaluation.

AI & ML arxiv | Mar 26

Achieves 6x compute reduction in Multimodal LLMs while actually improving accuracy by 2%.

AI & ML arxiv | Mar 27

Reconstructs entire Spiking Neural Networks into a single neuron via temporal multiplexing.

AI & ML arxiv | Mar 27

Introduces a stable backpropagation-free training framework for physical and photonic neural networks.

AI & ML arxiv | Mar 27

Achieves state-of-the-art vision-language pretraining using 300x less data than leading methods.

AI & ML arxiv | Mar 27

Enables 10x faster robot trajectory generation by distilling diffusion models into movement primitives.

AI & ML arxiv | Mar 27

Speeds up RL-based reasoning training by 1.7x using an online quality head to prune failing rollouts mid-generation.

AI & ML arxiv | Mar 27

Sparton is a specialized Triton kernel that solves the massive memory bottleneck of Learned Sparse Retrieval (LSR) models like Splade.

AI & ML arxiv | Mar 27

A fully differentiable agent-based traffic simulator enables calibration and control of million-vehicle networks 173x faster than real-time.

AI & ML arxiv | Mar 27

GIFT is a training-free frame selection framework that uses 'Directed Diversity' to boost Video-LLM performance by up to 12.5%.

AI & ML arxiv | Mar 27

Photon enables efficient 3D medical volume understanding through adaptive token scheduling and a novel 'gradient restoration' backpropagation rule.

AI & ML arxiv | Mar 27

Pruning low-utility prompts before RL rollouts allows for 10x more efficient training of large reasoning models.

AI & ML arxiv | Mar 27

Simple image sharpening serves as a surrogate-free, zero-cost preemptive defense against adversarial attacks.

AI & ML arxiv | Mar 27

A new tokenization architecture reduces the 'Token Tax' for complex non-Latin scripts by over 60%.

AI & ML arxiv | Mar 27

GlowQ introduces group-shared low-rank approximations to speed up quantized LLM inference by up to 37%.

AI & ML arxiv | Mar 27

Reduces LLM inference energy by 40% (and up to 81%) using a distillation-based router to skip unnecessary reasoning steps.

AI & ML arxiv | Mar 27

Unlocks full-body musculoskeletal humanoid training by achieving order-of-magnitude speedups via massively parallel GPU simulation.

AI & ML arxiv | Mar 27

Achieves 45% performance gains in robotics using 5-10x fewer real-world demonstrations through high-dimensional factorization.

AI & ML arxiv | Mar 27

Achieves up to 4.7x speedup for Diffusion LLMs using a training-free self-speculative decoding framework.

AI & ML arxiv | Mar 27

Generates 2-minute 480p videos on a single H200 GPU through a hierarchical KV-cache strategy that compresses context by 32x.

AI & ML arxiv | Mar 27

Enables 4K novel view synthesis in a feed-forward manner by decoupling geometric complexity from rendering resolution.

AI & ML arxiv | Mar 27

Demonstrates that general-purpose coding agents can achieve 20x speedups in hardware design optimization without domain-specific training.

AI & ML arxiv | Mar 27

A training-free enhancement that unlocks multi-scale synergies in Vision Foundation Models (VFMs) to boost performance across various tasks.

AI & ML arxiv | Mar 27

Prunes 85% of visual tokens in Vision-Language-Action (VLA) models while retaining 94% accuracy for autonomous driving.

AI & ML arxiv | Mar 30

Extracts dense 3D Signed Distance Fields from images in under 3 seconds using feed-forward geometry transformer latents.

AI & ML arxiv | Mar 30

Parallelizes diffusion model sampling across multiple devices using a draft-and-refine process for up to 3.7x speedups.

AI & ML arxiv | Mar 30

Introduces a discrete-ratio selector for context compression that solves the problem of variable information density in long-form text.

AI & ML arxiv | Mar 30

Achieves state-of-the-art video understanding without the need for expensive human-annotated Chain-of-Thought (CoT) data.

AI & ML arxiv | Mar 30

Releases a composable, Optax-native stack that makes high-overhead second-order optimization methods (like K-FAC) practical and swappable.

AI & ML arxiv | Mar 30

Introduces a self-driven collaboration paradigm where an agent uses its own 'reflection' signals to escalate difficult tasks to a stronger model tier.

AI & ML arxiv | Mar 30

Achieves 16x prefill speedup for video models by using reinforcement learning to dynamically compress visual tokens based on temporal 'surprise'.

AI & ML arxiv | Mar 30

Demonstrates real-world robotic navigation policy training and deployment in under 120 minutes using only a consumer laptop and no human intervention.

AI & ML arxiv | Mar 30

Turns pretrained video diffusion models into high-efficiency codecs, achieving high-quality reconstruction at extremely low bitrates (below 0.002 bpp) without retraining.

AI & ML arxiv | Mar 30

Achieves competitive continual learning accuracy with a 90% reduction in memory cost.

AI & ML arxiv | Mar 31

Batch-level query routing for LLMs allows for strict cost and capacity control that per-query methods cannot achieve.

AI & ML arxiv | Mar 31

Achieves high-fidelity LiDAR densification in just 156ms while strictly enforcing sensor physics to prevent 'ghost points'.

AI & ML arxiv | Mar 31

Demonstrates that Liquid Neural Networks can outperform Diffusion Policies in imitation learning with half the parameters and nearly 2x faster inference.

AI & ML arxiv | Mar 31

Achieves a 45x reduction in video generation inference latency and 2.5x higher training throughput using an efficient solution-flow framework.

AI & ML arxiv | Mar 31

GSR-GNN achieves 30x training speedups and 87% memory reduction for deep Graph Neural Networks on circuit graphs.

AI & ML arxiv | Mar 31

Scales Maximum Entropy population synthesis from 20 to 50+ categorical attributes by replacing exact expectation sums with Persistent Contrastive Divergence.

AI & ML arxiv | Mar 31