EFFICIENCY_BREAKTHROUGH EFFICIENCY_BREAKTHROUGH
375 papers · Page 3 of 4
Prompt Replay speeds up GRPO training by selectively reusing 'medium difficulty' prompts to maximize learning signal in RL rollouts.
AI & ML arxiv | Mar 24
Breaks the massive compute barrier for medium-range weather forecasting, training on a single consumer-grade GPU.
AI & ML arxiv | Mar 24
An autonomous agent loop that optimizes GPU kernels to outperform human-expert and compiler-generated baselines.
AI & ML arxiv | Mar 24
Introduces AgentHER, a framework that salvages 'failed' agent trajectories by relabeling them as successful demonstrations for alternative goals.
AI & ML arxiv | Mar 24
TIDE is a post-training early-exit system that allows individual tokens to skip unnecessary layers, improving throughput by up to 8% with minimal calibration.
AI & ML arxiv | Mar 24
PivotRL identifies 'pivot' turns in agent trajectories where actions matter most, enabling compute-efficient reinforcement learning that matches end-to-end RL at 4x lower cost.
AI & ML arxiv | Mar 24
KG-Hopper enables 7B-parameter models to outperform 70B systems on complex Knowledge Graph reasoning by embedding the entire multi-hop process into a single 'thinking' stage.
AI & ML arxiv | Mar 24
Achieves state-of-the-art open-vocabulary segmentation using a training-free, purely geometric projection and propagation method.
AI & ML arxiv | Mar 24
Enables merging independently trained specialist models (e.g., Vision-LLM and Audio-LLM) into a single multimodal model without any paired training data.
AI & ML arxiv | Mar 24
SparseVoxelDet is the first fully sparse object detector for event cameras that never instantiates a dense tensor, achieving 858x GPU memory compression.
AI & ML arxiv | Mar 24
Confidence-Evidence Bayesian Gain (CEBaG) provides deterministic hallucination detection for medical VQA without requiring 10-20 stochastic generations.
AI & ML arxiv | Mar 24
Enables high-performance Zeroth-Order (ZO) fine-tuning of LLMs by leveraging online curvature signals.
AI & ML arxiv | Mar 24
Reduces token consumption in interleaved multimodal reasoning by over 72% using dynamic visual thoughts.
AI & ML arxiv | Mar 24
Eliminates the need for strictly aligned image pairs in infrared and visible image fusion.
AI & ML arxiv | Mar 24
Reduces human annotation requirements for NLP model testing by up to 95%.
AI & ML arxiv | Mar 24
Achieves a 50x reduction in visual tokens for Video-LLMs while preserving over 90% of baseline performance.
AI & ML arxiv | Mar 24
Introduces a learnable bridge between GELU and ReLU activations to enable deployment-friendly piecewise-linear networks.
AI & ML arxiv | Mar 24
Achieves a 75x parameter reduction in 3D medical image segmentation by hybridizing Mamba and Transformer modules.
AI & ML arxiv | Mar 24
Introduces a streaming detection head that stops Large Reasoning Models (LRMs) from 'overthinking' redundant steps.
AI & ML arxiv | Mar 24
Reduces the token count of Stable Diffusion 3.5 by 4x for high-resolution generation with minimal fine-tuning.
AI & ML arxiv | Mar 24
A predictive scheduling system for multi-agent workflows that optimizes serving across heterogeneous LLM clusters (mixing large and small models).
AI & ML arxiv | Mar 24
Enables high-rank (r=384) DoRA training on single GPUs through factored norms and fused Triton kernels.
AI & ML arxiv | Mar 24
Introduces a parallel reasoning mechanism for Vision-Language-Action (VLA) models that eliminates the latency bottleneck of autoregressive Chain-of-Thought.
AI & ML arxiv | Mar 24
A training-free feature caching framework that achieves 2.3x speedup for video world models while maintaining 99.4% quality.
AI & ML arxiv | Mar 24
A unified discrete diffusion framework that outperforms autoregressive models on large-scale discrete generation tasks for the first time.
AI & ML arxiv | Mar 24
Sparse Feature Attention (SFA) reduces attention costs from quadratic in sequence length and linear in dimension to a fraction based on feature sparsity, enabling 2.5x speedups.
AI & ML arxiv | Mar 25
Standard quantization destroys the small parameter 'deltas' that encode post-training knowledge; Delta-Aware Quantization (DAQ) fixes this by optimizing for sign preservation.
AI & ML arxiv | Mar 25
Hybrid Associative Memory (HAM) layers allow the KV cache to grow dynamically based only on information that an internal RNN cannot predict.
AI & ML arxiv | Mar 25
Proposes an agentic architecture that achieves O(1) token complexity relative to dataset size by strictly separating intent parsing from deterministic data execution.
AI & ML arxiv | Mar 25
Achieves high-fidelity diffusion generation in just 3 steps by distilling layer-wise time embeddings from reference trajectories.
AI & ML arxiv | Mar 25
Introduces a verifier that operates directly on the latent hidden states of Diffusion Transformers, avoiding the need for costly pixel-space decoding during inference-time scaling.
AI & ML arxiv | Mar 25
A 0.26M parameter model using continuous dynamics outperforms 27M parameter recursive models on complex logic tasks like Sudoku-Extreme.
AI & ML arxiv | Mar 25
Agile-VLA enables high-frequency robot control on edge devices by decoupling perception from action through implicit affordance anchoring.
AI & ML arxiv | Mar 25
EchoKV introduces a reversible KV cache compression scheme that allows LLMs to switch back to full-precision inference on-demand.
AI & ML arxiv | Mar 25
ForestPrune achieves up to 90% token reduction in video MLLMs with minimal accuracy loss using a training-free spatial-temporal forest modeling approach.
AI & ML arxiv | Mar 25
Optimizing autoregressive image models with Group Relative Policy Optimization (GRPO) achieves competitive quality without the 2x inference cost of Classifier-Free Guidance.
AI & ML arxiv | Mar 25
DILLO enables 14x faster safety-critical agent steering by predicting action consequences from latent states instead of heavy visual simulations.
AI & ML arxiv | Mar 25
ImplicitRM enables unbiased reward modeling from 'messy' implicit feedback (clicks/copies), drastically reducing the cost of RLHF data collection.
AI & ML arxiv | Mar 25
Introduces custom CUDA kernels and a sparse packing format that enables Transformers to maintain performance with over 99% feedforward sparsity.
AI & ML arxiv | Mar 25
Upgrades video Diffusion Transformers to ultra-high-resolution synthesis using a two-stage 'Relay LoRA' adaptation on pure images.
AI & ML arxiv | Mar 25
Challenges the dominance of on-policy RL for LLMs by introducing a practical off-policy value-based framework that enables data reuse.
AI & ML arxiv | Mar 25
An online length-aware scheduling strategy that eliminates training 'bubbles' during the rollout phase of LLM reinforcement learning.
AI & ML arxiv | Mar 25
Leverages human gaze tracking to assign non-uniform token density in diffusion models, creating perceptually perfect images with significantly less compute.
AI & ML arxiv | Mar 25
Replaces visual token compression with sparse, dynamically selected vision-language interactions in VLLMs.
AI & ML arxiv | Mar 25
Introduces on-the-fly quantization that calibrates to individual prompts during inference, solving the 'domain shift' problem where standard quantization fails on unseen data.
AI & ML arxiv | Mar 25
Memory Sparse Attention (MSA) enables LLMs to scale to 100 million tokens with linear complexity and less than 9% precision degradation.
AI & ML arxiv | Mar 26
The first sorting-free stochastic formulation for 3D Gaussian Splatting that matches rasterization speed while enabling full ray-traced effects.
AI & ML arxiv | Mar 26
AI agent benchmarks can be slashed by ~50% in cost by only evaluating on tasks with intermediate historical pass rates.
AI & ML arxiv | Mar 26
Hybrid Distillation Policy Optimization (HDPO) overcomes the 'vanishing gradient' problem for hard mathematical prompts that RL agents cannot solve.
AI & ML arxiv | Mar 26
A self-distillation method for Multi-Token Prediction (MTP) that yields a 220% inference speedup with minimal training cost.
AI & ML arxiv | Mar 26
AttentionPack achieves up to 8x memory efficiency during decoding for large vision-language models (VLMs).
AI & ML arxiv | Mar 26
SLAT-Phys predicts spatially varying material property fields directly from single RGB images with a 120x speedup.
AI & ML arxiv | Mar 26
Reduces Text-to-SQL input tokens by 99% by internalizing the database schema into the model weights through a two-phase fine-tuning approach.
AI & ML arxiv | Mar 26
MoE-Sieve reduces Mixture-of-Experts LoRA fine-tuning parameters and training time by ~70% by only adapting the most-frequently activated 'hot' experts.
AI & ML arxiv | Mar 26
Achieves up to 400x speedup and 64x memory reduction for open-vocabulary 3D scene understanding compared to current Gaussian Splatting methods.
AI & ML arxiv | Mar 26
Enables 1000x faster on-chip training for Weightless Neural Networks (WNNs) on FPGAs with drastically lower power consumption.
AI & ML arxiv | Mar 26
A 5M-parameter OCR model that rivals billion-parameter vision-language models, proving data-centric curation can beat raw parameter scale.
AI & ML arxiv | Mar 26
Achieves high-fidelity sub-seasonal weather forecasting with a 276M parameter model that matches 1.6B parameter baselines in accuracy and speed.
AI & ML arxiv | Mar 26
Agentic Variation Operators (AVO) replace fixed evolutionary heuristics with coding agents to discover GPU kernels that outperform FlashAttention-4 by 10.5%.
AI & ML arxiv | Mar 26
DreamerAD accelerates imagination-based training for autonomous driving by 80x, compressing 100-step diffusion sampling down to a single step.
AI & ML arxiv | Mar 26
The Multilevel Euler-Maruyama (ML-EM) method allows diffusion models to perform sampling at the computational cost of a single model evaluation.
AI & ML arxiv | Mar 26
Achieves 6x compute reduction in Multimodal LLMs while actually improving accuracy by 2%.
AI & ML arxiv | Mar 27
Reconstructs entire Spiking Neural Networks into a single neuron via temporal multiplexing.
AI & ML arxiv | Mar 27
Introduces a stable backpropagation-free training framework for physical and photonic neural networks.
AI & ML arxiv | Mar 27
Achieves state-of-the-art vision-language pretraining using 300x less data than leading methods.
AI & ML arxiv | Mar 27
Enables 10x faster robot trajectory generation by distilling diffusion models into movement primitives.
AI & ML arxiv | Mar 27
Speeds up RL-based reasoning training by 1.7x using an online quality head to prune failing rollouts mid-generation.
AI & ML arxiv | Mar 27
Sparton is a specialized Triton kernel that solves the massive memory bottleneck of Learned Sparse Retrieval (LSR) models like Splade.
AI & ML arxiv | Mar 27
A fully differentiable agent-based traffic simulator enables calibration and control of million-vehicle networks 173x faster than real-time.
AI & ML arxiv | Mar 27
GIFT is a training-free frame selection framework that uses 'Directed Diversity' to boost Video-LLM performance by up to 12.5%.
AI & ML arxiv | Mar 27
Photon enables efficient 3D medical volume understanding through adaptive token scheduling and a novel 'gradient restoration' backpropagation rule.
AI & ML arxiv | Mar 27
Pruning low-utility prompts before RL rollouts allows for 10x more efficient training of large reasoning models.
AI & ML arxiv | Mar 27
Simple image sharpening serves as a surrogate-free, zero-cost preemptive defense against adversarial attacks.
AI & ML arxiv | Mar 27
A new tokenization architecture reduces the 'Token Tax' for complex non-Latin scripts by over 60%.
AI & ML arxiv | Mar 27
GlowQ introduces group-shared low-rank approximations to speed up quantized LLM inference by up to 37%.
AI & ML arxiv | Mar 27
Reduces LLM inference energy by 40% (and up to 81%) using a distillation-based router to skip unnecessary reasoning steps.
AI & ML arxiv | Mar 27
Unlocks full-body musculoskeletal humanoid training by achieving order-of-magnitude speedups via massively parallel GPU simulation.
AI & ML arxiv | Mar 27
Achieves 45% performance gains in robotics using 5-10x fewer real-world demonstrations through high-dimensional factorization.
AI & ML arxiv | Mar 27
Achieves up to 4.7x speedup for Diffusion LLMs using a training-free self-speculative decoding framework.
AI & ML arxiv | Mar 27
Generates 2-minute 480p videos on a single H200 GPU through a hierarchical KV-cache strategy that compresses context by 32x.
AI & ML arxiv | Mar 27
Enables 4K novel view synthesis in a feed-forward manner by decoupling geometric complexity from rendering resolution.
AI & ML arxiv | Mar 27
Demonstrates that general-purpose coding agents can achieve 20x speedups in hardware design optimization without domain-specific training.
AI & ML arxiv | Mar 27
A training-free enhancement that unlocks multi-scale synergies in Vision Foundation Models (VFMs) to boost performance across various tasks.
AI & ML arxiv | Mar 27
Prunes 85% of visual tokens in Vision-Language-Action (VLA) models while retaining 94% accuracy for autonomous driving.
AI & ML arxiv | Mar 30
Extracts dense 3D Signed Distance Fields from images in under 3 seconds using feed-forward geometry transformer latents.
AI & ML arxiv | Mar 30
Parallelizes diffusion model sampling across multiple devices using a draft-and-refine process for up to 3.7x speedups.
AI & ML arxiv | Mar 30
Introduces a discrete-ratio selector for context compression that solves the problem of variable information density in long-form text.
AI & ML arxiv | Mar 30
Achieves state-of-the-art video understanding without the need for expensive human-annotated Chain-of-Thought (CoT) data.
AI & ML arxiv | Mar 30
Releases a composable, Optax-native stack that makes high-overhead second-order optimization methods (like K-FAC) practical and swappable.
AI & ML arxiv | Mar 30
Introduces a self-driven collaboration paradigm where an agent uses its own 'reflection' signals to escalate difficult tasks to a stronger model tier.
AI & ML arxiv | Mar 30
Achieves 16x prefill speedup for video models by using reinforcement learning to dynamically compress visual tokens based on temporal 'surprise'.
AI & ML arxiv | Mar 30
Demonstrates real-world robotic navigation policy training and deployment in under 120 minutes using only a consumer laptop and no human intervention.
AI & ML arxiv | Mar 30
Turns pretrained video diffusion models into high-efficiency codecs, achieving high-quality reconstruction at extremely low bitrates (below 0.002 bpp) without retraining.
AI & ML arxiv | Mar 30
Achieves competitive continual learning accuracy with a 90% reduction in memory cost.
AI & ML arxiv | Mar 31
Batch-level query routing for LLMs allows for strict cost and capacity control that per-query methods cannot achieve.
AI & ML arxiv | Mar 31
Achieves high-fidelity LiDAR densification in just 156ms while strictly enforcing sensor physics to prevent 'ghost points'.
AI & ML arxiv | Mar 31
Demonstrates that Liquid Neural Networks can outperform Diffusion Policies in imitation learning with half the parameters and nearly 2x faster inference.
AI & ML arxiv | Mar 31
Achieves a 45x reduction in video generation inference latency and 2.5x higher training throughput using an efficient solution-flow framework.
AI & ML arxiv | Mar 31
GSR-GNN achieves 30x training speedups and 87% memory reduction for deep Graph Neural Networks on circuit graphs.
AI & ML arxiv | Mar 31
Scales Maximum Entropy population synthesis from 20 to 50+ categorical attributes by replacing exact expectation sums with Persistent Contrastive Divergence.
AI & ML arxiv | Mar 31