EFFICIENCY_BREAKTHROUGH EFFICIENCY_BREAKTHROUGH
375 papers · Page 2 of 4
Achieves microsecond-level kinodynamic motion planning for high-DOF robots by using differential flatness to solve boundary value problems analytically.
AI & ML arxiv | Mar 18
Demonstrates that masked diffusion language models can be 21.8x more compute-efficient than traditional autoregressive models when scaled correctly.
AI & ML arxiv | Mar 18
Introduces Helium, a serving framework that treats agentic workflows as data query plans to optimize redundant LLM calls and KV caches.
AI & ML arxiv | Mar 18
Presents ZipCal, a model-agnostic calibration data selection strategy for pruning and quantization that is 240x faster than model-based methods.
AI & ML arxiv | Mar 18
VQKV uses Vector Quantization to achieve over 80% KV cache compression with almost zero loss in model performance.
AI & ML arxiv | Mar 18
FEAT is a linear-complexity foundation model designed specifically for extremely large-scale structured (tabular) data.
AI & ML arxiv | Mar 18
Enables stable 4-bit microscaling (MXFP4) quantization for Multi-modal LLMs, which previously suffered from performance collapse.
AI & ML arxiv | Mar 18
Low-precision optimizer states cause 'state staleness' where updates round back to stored values, but scheduled resets can fully recover performance loss.
AI & ML arxiv | Mar 18
GIST achieves O(N) complexity for Graph Transformers while maintaining gauge invariance, enabling scaling to meshes with 750K nodes.
AI & ML arxiv | Mar 18
Pretrained 3D generative models can be repurposed for high-quality part segmentation using less than 1% of the typical labeled data.
AI & ML arxiv | Mar 18
HoloByte is a tokenizer-free framework that projects byte sequences into a continuous hyperspherical manifold to bypass the morphological limits of discrete tokens.
AI & ML arxiv | Mar 19
AwaRes enables low-resolution Vision-Language Models to retrieve only the high-resolution image crops needed for a specific query via tool-calling.
AI & ML arxiv | Mar 19
Provides a systematic profiling of VLM inference bottlenecks and releases 'recipes' that cut time-to-first-token by up to 93%.
AI & ML arxiv | Mar 19
A backbone-agnostic denoising objective that allows small GNNs to outperform large models pretrained on much larger supervised datasets in physical sciences.
AI & ML arxiv | Mar 19
A dynamic data pruning framework that cuts dense retriever training time by 50% while actually improving retrieval accuracy.
AI & ML arxiv | Mar 19
Achieves up to a 1,000x gain in RLHF data efficiency by using information-directed exploration and epistemic neural networks.
AI & ML arxiv | Mar 19
Introduces a reward framework that reduces LLM reasoning verbosity by optimizing for 'Information Density' via entropy reduction per step.
AI & ML arxiv | Mar 19
Generates 9 million grid points of 3D spatiotemporal physical fields in seconds, a 10,000x speedup over traditional physics simulations.
AI & ML arxiv | Mar 19
Replaces quadratic self-attention with $O(N \log N)$ phase-native coupling for time-series, enabling massive context windows.
AI & ML arxiv | Mar 19
Achieves an 80% reduction in Chain-of-Thought (CoT) tokens while slightly increasing reasoning accuracy.
AI & ML arxiv | Mar 19
Extends LLM context from 32K to 128K by teaching models to selectively skip global attention for ~80% of tokens.
AI & ML arxiv | Mar 19
Knowledge-Aware Active Learning (KA2L) uses latent space probing to identify what an LLM doesn't know and generates targeted synthetic questions.
AI & ML arxiv | Mar 19
S-VGGT introduces structure-aware subscene decomposition to break the quadratic scaling bottleneck of 3D foundation models.
AI & ML arxiv | Mar 19
DSS-GAN is the first generative adversarial network to use a Mamba (State Space Model) backbone for high-quality image synthesis.
AI & ML arxiv | Mar 19
Synthetic videos of simple geometric shapes are more effective than massive real-world datasets for teaching video-language models fundamental temporal reasoning.
AI & ML arxiv | Mar 19
Anomaly detection can be performed directly using a primary model's internal neuron output ranges, eliminating the need for expensive external AD models.
AI & ML arxiv | Mar 19
Truncated backpropagation for video decoding reduces the memory cost of fine-tuning video diffusion models from linear to constant.
AI & ML arxiv | Mar 19
ProbeFlow achieves 14.8x faster action decoding in Vision-Language-Action (VLA) models without any retraining.
AI & ML arxiv | Mar 19
Parallel multi-token prediction can be achieved in standard LLMs without training auxiliary models or modifying weights.
AI & ML arxiv | Mar 19
CARE provides a recipe for converting standard GQA models into high-efficiency Multi-head Latent Attention (MLA) architectures.
AI & ML arxiv | Mar 19
VideoAtlas enables navigation and reasoning over long-form video using compute that scales only logarithmically with video length.
AI & ML arxiv | Mar 19
MUD provides a faster, lower-overhead alternative to Muon for transformer training, achieving up to 2.6x higher throughput.
AI & ML arxiv | Mar 19
LoST introduces a semantic-first 3D tokenizer that reduces the token count for 3D shape generation by up to 99.9%.
AI & ML arxiv | Mar 19
MineDraft achieves a 75% throughput increase in speculative decoding by overlapping the drafting and verification stages.
AI & ML arxiv | Mar 20
Q-Drift corrects quantization-induced noise in diffusion models using a plug-and-play sampler adjustment that requires only 5 calibration runs.
AI & ML arxiv | Mar 20
Achieves depth-independent training memory bounded to approximately twice the inference footprint.
AI & ML arxiv | Mar 20
A decoder-free world model that trains 1.59x faster than DreamerV3 while outperforming it on tasks with small, task-relevant objects.
AI & ML arxiv | Mar 20
Fixes the 'squeezing effect' in Direct Preference Optimization (DPO) using an efficient logit-space Sharpness-Aware Minimization.
AI & ML arxiv | Mar 20
PreSCAN predicts NeRF reconstruction quality in under 30 seconds, achieving a 1000x speedup over Neural Architecture Search.
AI & ML arxiv | Mar 20
TopoChunker maps documents to a Structured Intermediate Representation (SIR) to preserve hierarchical context during RAG chunking.
AI & ML arxiv | Mar 20
AFBS-BO automates the discovery of layer-specific sparse attention hyperparameters, making long-context acceleration 'plug-and-play.'
AI & ML arxiv | Mar 20
Discounted Beta-Bernoulli (DBB) reward estimation solves the variance collapse and sample inefficiency inherent in point-estimation RLVR methods for LLM reasoning.
AI & ML arxiv | Mar 20
EntropyCache achieves up to 26x speedup for Diffusion Language Models by using decoded token entropy as a proxy for KV cache staleness.
AI & ML arxiv | Mar 20
AIMER provides a calibration-free criterion for expert pruning in MoE models that matches state-of-the-art performance in seconds.
AI & ML arxiv | Mar 20
DDPO addresses the 'overthinking' and 'overconfidence' issues in Large Reasoning Models (LRMs) by optimizing answer length based on task difficulty.
AI & ML arxiv | Mar 20
Enables high-fidelity 3D satellite surface reconstruction in a single forward pass without per-scene optimization.
AI & ML arxiv | Mar 20
Matches the performance of the complex SFT+GRPO reasoning pipeline for Vision-Language Models in 1/7th of the training time.
AI & ML arxiv | Mar 20
Provides a mathematically grounded, efficient offline policy optimization method for Diffusion LLMs by estimating trajectory probabilities with a single forward pass.
AI & ML arxiv | Mar 20
Uses a lightweight GRPO-trained policy to select optimal video frames, reducing processing time by 93% while actually improving Video QA accuracy.
AI & ML arxiv | Mar 20
Bootstraps reasoning-heavy RL by stochastically injecting few-shot demonstrations into training prompts via a curriculum.
AI & ML arxiv | Mar 20
Aligns diffusion models with human preferences using only 100 samples, outperforming SOTA methods that use thousands.
AI & ML arxiv | Mar 20
Any-order autoregressive models can outperform diffusion-based classifiers while being 25x more efficient.
AI & ML arxiv | Mar 20
A GPU-accelerated metaheuristic framework that solves combinatorial optimization problems orders of magnitude faster than traditional MIP solvers.
AI & ML arxiv | Mar 20
Reduces reaction latency in flow-based VLA models by 10x, enabling real-time responsiveness on consumer GPUs.
AI & ML arxiv | Mar 20
A 30B MoE model with only 3B active parameters achieves Gold Medal-level performance in International Math and Informatics Olympiads.
AI & ML arxiv | Mar 20
Achieves state-of-the-art LLM distillation using 10-25% of the data required by standard fine-tuning.
AI & ML arxiv | Mar 23
Accelerates MoE inference by speculating future experts to overlap CPU-GPU memory transfers with computation.
AI & ML arxiv | Mar 23
Achieve 97% of Oracle reward performance using only 20% of the training labels for complex LLM reasoning.
AI & ML arxiv | Mar 23
The first Joint Embedding Predictive Architecture (JEPA) to train stably end-to-end from raw pixels with massive planning speedups.
AI & ML arxiv | Mar 23
DAPA speeds up GELU computation by 16x and reduces hardware DSP utilization by 16x for on-device Transformer deployment.
AI & ML arxiv | Mar 23
Spectral Tempering achieves near-oracle embedding compression for dense retrieval without requiring any labeled data or grid searching.
AI & ML arxiv | Mar 23
Empirically proves that most Transformer layers are redundant, enabling a 54% training cost reduction through non-uniform budget allocation.
AI & ML arxiv | Mar 23
Warm-Start Flow Matching provides a guaranteed speedup for image/text generation by using lightweight models as initial priors.
AI & ML arxiv | Mar 23
Adaptive Layerwise Perturbation (ALP) solves the training-inference mismatch and importance ratio blowup in LLM reinforcement learning.
AI & ML arxiv | Mar 23
EvidenceRL uses reinforcement learning (GRPO) to explicitly optimize for evidence adherence, reducing hallucinations in high-stakes RAG pipelines.
AI & ML arxiv | Mar 23
Accelerates diffusion-based image decoders by an order of magnitude using multi-scale sampling and one-step distillation.
AI & ML arxiv | Mar 23
Reduces covariance tracking error by 30x by reformulating the problem as rigid-body motion on Lie groups.
AI & ML arxiv | Mar 23
Achieves a 19x reduction in inference cost and 16x in latency for agentic workflows by evolving hybrid LLM-and-code pipelines.
AI & ML arxiv | Mar 23
Reduces long-context inference latency by 26.4x using a training-free, structure-aware prompt compression framework.
AI & ML arxiv | Mar 23
Introduces the first reinforcement learning framework to compress implicit reasoning steps in looped language models.
AI & ML arxiv | Mar 23
Achieves O(1) time complexity for dense component attribution in SwiGLU Transformers using a single forward-backward pass.
AI & ML arxiv | Mar 23
A training-free method to fix intra-modal misalignment in CLIP by decomposing projectors into an isotropic aligned subspace.
AI & ML arxiv | Mar 23
NASimJax provides a 100x throughput increase for autonomous penetration testing simulators by reimplementing the environment in JAX.
AI & ML arxiv | Mar 23
SAGE achieves state-of-the-art translation for low-resource languages while reducing training data requirements by 97.1% via RL-guided curation.
AI & ML arxiv | Mar 23
Memori reduces agent token costs by 20x by replacing raw conversation history with a persistent layer of semantic triples and summaries.
AI & ML arxiv | Mar 23
2K Retrofit enables 2K-resolution inference for any 3D geometric foundation model without modifying or retraining the backbone.
AI & ML arxiv | Mar 23
A k-means variant that is up to 7x faster than FAISS and Scikit-Learn on CPUs and 4x faster than cuVS on GPUs.
AI & ML arxiv | Mar 23
Reduces the computational cost of Neural Architecture Search for ensembles from O(M) to O(1).
AI & ML arxiv | Mar 23
Quantifies LLM uncertainty in a single generation pass without auxiliary models or repeated sampling.
AI & ML arxiv | Mar 23
Introduces a long-horizon video agent that uses 93% fewer frames than GPT-5/standalone LMMs while achieving higher accuracy.
AI & ML arxiv | Mar 23
Provides a robust method for distilling discrete diffusion models that maintains quality and diversity even with very few sampling steps.
AI & ML arxiv | Mar 23
Achieves over 10x faster sampling for diffusion language models by shifting the process into continuous semantic space.
AI & ML arxiv | Mar 24
Integrates fast scalar rewards with slow generative CoT reasoning to reduce reward model token consumption by 20%.
AI & ML arxiv | Mar 24
Enables precise prompt routing by predicting the expected reward of a model before any response is generated.
AI & ML arxiv | Mar 24
Reduces Tree of Thought (ToT) computational overhead by up to 75% using plug-and-play predictors for pruning.
AI & ML arxiv | Mar 24
STAC achieves a 10x memory reduction and 4x speedup for real-time streaming 3D reconstruction using spatio-temporal cache compression.
AI & ML arxiv | Mar 24
DiffMark enables multi-bit watermarking that is transferable across different frozen diffusion models with a 45x speedup over current methods.
AI & ML arxiv | Mar 24
VGS-Decoding is a training-free method to mitigate medical VLM hallucinations by reweighting token probabilities based on their visual dependency.
AI & ML arxiv | Mar 24
GEM is the first native graph-based index for multi-vector (ColBERT-style) retrieval, achieving up to 16x speedups over existing single-vector index adaptations.
AI & ML arxiv | Mar 24
AE-LLM automatically orchestrates the optimal combination of MoE, quantization, and PEFT for specific deployment hardware and tasks.
AI & ML arxiv | Mar 24
Row-Momentum Normalized Preconditioning (RMNP) provides Muon-level performance with significantly lower computational complexity.
AI & ML arxiv | Mar 24
3D object localization can be achieved 100x faster by using image-based 'visual memory' instead of global 3D scene reconstruction.
AI & ML arxiv | Mar 24
Vision-Language Models can be steered to understand negation using geometry-based representation engineering without any fine-tuning.
AI & ML arxiv | Mar 24
Memory-Keyed Attention (MKA) achieves 5x faster training throughput and nearly 2x lower latency while matching the accuracy of compressed attention variants.
AI & ML arxiv | Mar 24
GaussianPile adapts 3D Gaussian Splatting for volumetric imaging, achieving 11x faster reconstruction than NeRFs and 16x compression over voxel grids.
AI & ML arxiv | Mar 24
MixedDimKV achieves 100% accuracy on 50K context lengths while using as little as 0.26% of the traditional KV cache.
AI & ML arxiv | Mar 24
A low-resource SOP using 'Shadow-RAG' enables 32B models to reach 90% accuracy on graduate-level exams with only 3 days of labor.
AI & ML arxiv | Mar 24
A routing framework that uses internal prefill activations to select the optimal LLM for a task, capturing 45% of the oracle accuracy gap with 74% cost savings.
AI & ML arxiv | Mar 24
A training-free visual token pruning framework for Large Vision-Language Models that preserves geometric structure through subspace reconstruction.
AI & ML arxiv | Mar 24
Free Sinewich enables parameter-efficient multi-task learning using frequency-based weight modulation with near-zero overhead.
AI & ML arxiv | Mar 24