Efficiency Breakthrough

375 papers · Page 3 of 8

Filter by desk: AI Computing Robotics Math Quantum Physics Space Earth Chemistry Engineering Ecology Biology Neuroscience Health Psychology Economics Society

Photon enables efficient 3D medical volume understanding through adaptive token scheduling and a novel 'gradient restoration' backpropagation rule.

Pruning low-utility prompts before RL rollouts allows for 10x more efficient training of large reasoning models.

Simple image sharpening serves as a surrogate-free, zero-cost preemptive defense against adversarial attacks.

A new tokenization architecture reduces the 'Token Tax' for complex non-Latin scripts by over 60%.

GlowQ introduces group-shared low-rank approximations to speed up quantized LLM inference by up to 37%.

Reduces LLM inference energy by 40% (and up to 81%) using a distillation-based router to skip unnecessary reasoning steps.

Unlocks full-body musculoskeletal humanoid training by achieving order-of-magnitude speedups via massively parallel GPU simulation.

Achieves 45% performance gains in robotics using 5-10x fewer real-world demonstrations through high-dimensional factorization.

Achieves up to 4.7x speedup for Diffusion LLMs using a training-free self-speculative decoding framework.

Generates 2-minute 480p videos on a single H200 GPU through a hierarchical KV-cache strategy that compresses context by 32x.

Enables 4K novel view synthesis in a feed-forward manner by decoupling geometric complexity from rendering resolution.

Demonstrates that general-purpose coding agents can achieve 20x speedups in hardware design optimization without domain-specific training.

A training-free enhancement that unlocks multi-scale synergies in Vision Foundation Models (VFMs) to boost performance across various tasks.

Memory Sparse Attention (MSA) enables LLMs to scale to 100 million tokens with linear complexity and less than 9% precision degradation.

The first sorting-free stochastic formulation for 3D Gaussian Splatting that matches rasterization speed while enabling full ray-traced effects.

AI agent benchmarks can be slashed by ~50% in cost by only evaluating on tasks with intermediate historical pass rates.

Hybrid Distillation Policy Optimization (HDPO) overcomes the 'vanishing gradient' problem for hard mathematical prompts that RL agents cannot solve.

A self-distillation method for Multi-Token Prediction (MTP) that yields a 220% inference speedup with minimal training cost.

AttentionPack achieves up to 8x memory efficiency during decoding for large vision-language models (VLMs).

SLAT-Phys predicts spatially varying material property fields directly from single RGB images with a 120x speedup.

Reduces Text-to-SQL input tokens by 99% by internalizing the database schema into the model weights through a two-phase fine-tuning approach.

MoE-Sieve reduces Mixture-of-Experts LoRA fine-tuning parameters and training time by ~70% by only adapting the most-frequently activated 'hot' experts.

Achieves up to 400x speedup and 64x memory reduction for open-vocabulary 3D scene understanding compared to current Gaussian Splatting methods.

Enables 1000x faster on-chip training for Weightless Neural Networks (WNNs) on FPGAs with drastically lower power consumption.

A 5M-parameter OCR model that rivals billion-parameter vision-language models, proving data-centric curation can beat raw parameter scale.

Achieves high-fidelity sub-seasonal weather forecasting with a 276M parameter model that matches 1.6B parameter baselines in accuracy and speed.

Agentic Variation Operators (AVO) replace fixed evolutionary heuristics with coding agents to discover GPU kernels that outperform FlashAttention-4 by 10.5%.

DreamerAD accelerates imagination-based training for autonomous driving by 80x, compressing 100-step diffusion sampling down to a single step.

The Multilevel Euler-Maruyama (ML-EM) method allows diffusion models to perform sampling at the computational cost of a single model evaluation.

Sparse Feature Attention (SFA) reduces attention costs from quadratic in sequence length and linear in dimension to a fraction based on feature sparsity, enabling 2.5x speedups.

Standard quantization destroys the small parameter 'deltas' that encode post-training knowledge; Delta-Aware Quantization (DAQ) fixes this by optimizing for sign preservation.

Hybrid Associative Memory (HAM) layers allow the KV cache to grow dynamically based only on information that an internal RNN cannot predict.

Proposes an agentic architecture that achieves O(1) token complexity relative to dataset size by strictly separating intent parsing from deterministic data execution.

Achieves high-fidelity diffusion generation in just 3 steps by distilling layer-wise time embeddings from reference trajectories.

Introduces a verifier that operates directly on the latent hidden states of Diffusion Transformers, avoiding the need for costly pixel-space decoding during inference-time scaling.

A 0.26M parameter model using continuous dynamics outperforms 27M parameter recursive models on complex logic tasks like Sudoku-Extreme.

Agile-VLA enables high-frequency robot control on edge devices by decoupling perception from action through implicit affordance anchoring.

EchoKV introduces a reversible KV cache compression scheme that allows LLMs to switch back to full-precision inference on-demand.

ForestPrune achieves up to 90% token reduction in video MLLMs with minimal accuracy loss using a training-free spatial-temporal forest modeling approach.

Optimizing autoregressive image models with Group Relative Policy Optimization (GRPO) achieves competitive quality without the 2x inference cost of Classifier-Free Guidance.

DILLO enables 14x faster safety-critical agent steering by predicting action consequences from latent states instead of heavy visual simulations.

ImplicitRM enables unbiased reward modeling from 'messy' implicit feedback (clicks/copies), drastically reducing the cost of RLHF data collection.

Introduces custom CUDA kernels and a sparse packing format that enables Transformers to maintain performance with over 99% feedforward sparsity.

Upgrades video Diffusion Transformers to ultra-high-resolution synthesis using a two-stage 'Relay LoRA' adaptation on pure images.

Challenges the dominance of on-policy RL for LLMs by introducing a practical off-policy value-based framework that enables data reuse.

An online length-aware scheduling strategy that eliminates training 'bubbles' during the rollout phase of LLM reinforcement learning.

Leverages human gaze tracking to assign non-uniform token density in diffusion models, creating perceptually perfect images with significantly less compute.

Replaces visual token compression with sparse, dynamically selected vision-language interactions in VLLMs.

Introduces on-the-fly quantization that calibrates to individual prompts during inference, solving the 'domain shift' problem where standard quantization fails on unseen data.

Achieves over 10x faster sampling for diffusion language models by shifting the process into continuous semantic space.