Efficiency Breakthrough

375 papers · Page 5 of 8

Filter by desk: AI Computing Robotics Math Quantum Physics Space Earth Chemistry Engineering Ecology Biology Neuroscience Health Psychology Economics Society

Warm-Start Flow Matching provides a guaranteed speedup for image/text generation by using lightweight models as initial priors.

Adaptive Layerwise Perturbation (ALP) solves the training-inference mismatch and importance ratio blowup in LLM reinforcement learning.

EvidenceRL uses reinforcement learning (GRPO) to explicitly optimize for evidence adherence, reducing hallucinations in high-stakes RAG pipelines.

Accelerates diffusion-based image decoders by an order of magnitude using multi-scale sampling and one-step distillation.

Reduces covariance tracking error by 30x by reformulating the problem as rigid-body motion on Lie groups.

Achieves a 19x reduction in inference cost and 16x in latency for agentic workflows by evolving hybrid LLM-and-code pipelines.

Reduces long-context inference latency by 26.4x using a training-free, structure-aware prompt compression framework.

Introduces the first reinforcement learning framework to compress implicit reasoning steps in looped language models.

Achieves O(1) time complexity for dense component attribution in SwiGLU Transformers using a single forward-backward pass.

A training-free method to fix intra-modal misalignment in CLIP by decomposing projectors into an isotropic aligned subspace.

NASimJax provides a 100x throughput increase for autonomous penetration testing simulators by reimplementing the environment in JAX.

SAGE achieves state-of-the-art translation for low-resource languages while reducing training data requirements by 97.1% via RL-guided curation.

Memori reduces agent token costs by 20x by replacing raw conversation history with a persistent layer of semantic triples and summaries.

2K Retrofit enables 2K-resolution inference for any 3D geometric foundation model without modifying or retraining the backbone.

A k-means variant that is up to 7x faster than FAISS and Scikit-Learn on CPUs and 4x faster than cuVS on GPUs.

Reduces the computational cost of Neural Architecture Search for ensembles from O(M) to O(1).

Quantifies LLM uncertainty in a single generation pass without auxiliary models or repeated sampling.

Introduces a long-horizon video agent that uses 93% fewer frames than GPT-5/standalone LMMs while achieving higher accuracy.

Provides a robust method for distilling discrete diffusion models that maintains quality and diversity even with very few sampling steps.

MineDraft achieves a 75% throughput increase in speculative decoding by overlapping the drafting and verification stages.

Q-Drift corrects quantization-induced noise in diffusion models using a plug-and-play sampler adjustment that requires only 5 calibration runs.

Achieves depth-independent training memory bounded to approximately twice the inference footprint.

A decoder-free world model that trains 1.59x faster than DreamerV3 while outperforming it on tasks with small, task-relevant objects.

Fixes the 'squeezing effect' in Direct Preference Optimization (DPO) using an efficient logit-space Sharpness-Aware Minimization.

PreSCAN predicts NeRF reconstruction quality in under 30 seconds, achieving a 1000x speedup over Neural Architecture Search.

TopoChunker maps documents to a Structured Intermediate Representation (SIR) to preserve hierarchical context during RAG chunking.

AFBS-BO automates the discovery of layer-specific sparse attention hyperparameters, making long-context acceleration 'plug-and-play.'

Discounted Beta-Bernoulli (DBB) reward estimation solves the variance collapse and sample inefficiency inherent in point-estimation RLVR methods for LLM reasoning.

EntropyCache achieves up to 26x speedup for Diffusion Language Models by using decoded token entropy as a proxy for KV cache staleness.

AIMER provides a calibration-free criterion for expert pruning in MoE models that matches state-of-the-art performance in seconds.

DDPO addresses the 'overthinking' and 'overconfidence' issues in Large Reasoning Models (LRMs) by optimizing answer length based on task difficulty.

Enables high-fidelity 3D satellite surface reconstruction in a single forward pass without per-scene optimization.

Matches the performance of the complex SFT+GRPO reasoning pipeline for Vision-Language Models in 1/7th of the training time.

Provides a mathematically grounded, efficient offline policy optimization method for Diffusion LLMs by estimating trajectory probabilities with a single forward pass.

Uses a lightweight GRPO-trained policy to select optimal video frames, reducing processing time by 93% while actually improving Video QA accuracy.

Bootstraps reasoning-heavy RL by stochastically injecting few-shot demonstrations into training prompts via a curriculum.

Aligns diffusion models with human preferences using only 100 samples, outperforming SOTA methods that use thousands.

Any-order autoregressive models can outperform diffusion-based classifiers while being 25x more efficient.

A GPU-accelerated metaheuristic framework that solves combinatorial optimization problems orders of magnitude faster than traditional MIP solvers.

Reduces reaction latency in flow-based VLA models by 10x, enabling real-time responsiveness on consumer GPUs.

A 30B MoE model with only 3B active parameters achieves Gold Medal-level performance in International Math and Informatics Olympiads.

HoloByte is a tokenizer-free framework that projects byte sequences into a continuous hyperspherical manifold to bypass the morphological limits of discrete tokens.

AwaRes enables low-resolution Vision-Language Models to retrieve only the high-resolution image crops needed for a specific query via tool-calling.

Provides a systematic profiling of VLM inference bottlenecks and releases 'recipes' that cut time-to-first-token by up to 93%.

A backbone-agnostic denoising objective that allows small GNNs to outperform large models pretrained on much larger supervised datasets in physical sciences.

A dynamic data pruning framework that cuts dense retriever training time by 50% while actually improving retrieval accuracy.

Achieves up to a 1,000x gain in RLHF data efficiency by using information-directed exploration and epistemic neural networks.

Introduces a reward framework that reduces LLM reasoning verbosity by optimizing for 'Information Density' via entropy reduction per step.

Generates 9 million grid points of 3D spatiotemporal physical fields in seconds, a 10,000x speedup over traditional physics simulations.

Replaces quadratic self-attention with $O(N \log N)$ phase-native coupling for time-series, enabling massive context windows.