Efficiency Breakthrough

375 papers · Page 2 of 8

Filter by desk: AI Computing Robotics Math Quantum Physics Space Earth Chemistry Engineering Ecology Biology Neuroscience Health Psychology Economics Society

Achieves a 45x reduction in video generation inference latency and 2.5x higher training throughput using an efficient solution-flow framework.

GSR-GNN achieves 30x training speedups and 87% memory reduction for deep Graph Neural Networks on circuit graphs.

Scales Maximum Entropy population synthesis from 20 to 50+ categorical attributes by replacing exact expectation sums with Persistent Contrastive Divergence.

A unified framework for neural network recombination that achieves state-of-the-art fine-tuning with fewer than 200 parameters.

GIFT bootstraps image-to-CAD generation by turning inference-time failures into synthetic training data, reducing inference compute by 80%.

Near-lossless KV cache compression using angular quantization in the Walsh-Hadamard domain at ~3.5 bits per element.

Achieves a 79,000x reduction in energy per inference for insulin dose calculation using Spiking Neural Networks (SNNs).

Uses spectral decomposition of inverse dynamics to enable real-time planning of long-horizon robotic manipulation tasks (10+ contact modes).

KVSculpt moves beyond simple eviction/merging to optimize unconstrained KV pairs in continuous space for extreme cache compression.

SAGE mitigates multimodal hallucinations by monitoring 'attention sinks' and dynamically modulating self-attention during the decoding process.

ITQ3_S achieves high-fidelity 3-bit LLM inference by using rotation-domain smoothing to eliminate the catastrophic precision loss caused by outliers.

ExFusion enables Transformer models to gain the capacity of Mixture-of-Experts during training while remaining a standard dense model for deployment.

Dataset Concentration (DsCo) achieves nearly lossless dataset reduction by aligning distributions via diffusion models, cutting storage and training costs by half.

Decoupled language models reduce the compute required for OCR domain adaptation by 95% while matching SOTA transformer accuracy.

Drift-AR enables single-step (1-NFE) high-fidelity image generation by reinterpreting AR prediction entropy as a physical drifting field.

ROVED reduces the expensive human feedback required for preference-based RL by up to 90% by leveraging vision-language embeddings and uncertainty filtering.

Introduces Heddle, a trajectory-centric system that resolves the long-tail latency bottleneck of tool calls in agentic Reinforcement Learning.

Replaces the classic Newton-Raphson power-flow solver with a differentiable GPU-accelerated simulation.

Introduces lightweight equilibration to the Muon optimizer, significantly stabilizing and accelerating LLM pretraining.

Enables instruction-following in low-resource languages by simply merging target language base models with English-instructed models.

An evolutionary framework for GPU kernel generation that outperforms frontier models like Claude 4.6 and Gemini 3.0.

HISA eliminates the quadratic O(L²) bottleneck in sparse attention indexers, enabling efficient long-context scaling for models like DeepSeek-V3.

IsoQuant leverages SO(4) isoclinic rotations to achieve a 4.5x-4.7x speedup in low-bit KV-cache quantization over existing methods.

INSID3 achieves state-of-the-art one-shot image segmentation using only frozen DINOv3 features without any training, fine-tuning, or auxiliary models.

EdgeDiT provides a hardware-aware blueprint for running massive Diffusion Transformers (DiT) on mobile NPUs with a 1.6x reduction in latency.

LAD achieves 3x lower latency than previous driving language models by generating textual reasoning and motion plans at up to 20 Hz.

Hydra unifies ColBERT-style retrieval and autoregressive generation into a single Vision-Language Model using a single LoRA adapter.

StreamingVLA eliminates execution halting in robots by asynchronously parallelizing observation, generation, and execution.

ResAdapt learns a per-frame visual budget allocator that optimizes input resolution before encoding.

RNNs can be trained online without Jacobian propagation, matching BPTT performance at 1000x less memory.

IF4 introduces an adaptive 4-bit data type that switches between Float and Integer representations to minimize quantization error.

Prunes 85% of visual tokens in Vision-Language-Action (VLA) models while retaining 94% accuracy for autonomous driving.

Extracts dense 3D Signed Distance Fields from images in under 3 seconds using feed-forward geometry transformer latents.

Parallelizes diffusion model sampling across multiple devices using a draft-and-refine process for up to 3.7x speedups.

Introduces a discrete-ratio selector for context compression that solves the problem of variable information density in long-form text.

Achieves state-of-the-art video understanding without the need for expensive human-annotated Chain-of-Thought (CoT) data.

Releases a composable, Optax-native stack that makes high-overhead second-order optimization methods (like K-FAC) practical and swappable.

Introduces a self-driven collaboration paradigm where an agent uses its own 'reflection' signals to escalate difficult tasks to a stronger model tier.

Achieves 16x prefill speedup for video models by using reinforcement learning to dynamically compress visual tokens based on temporal 'surprise'.

Demonstrates real-world robotic navigation policy training and deployment in under 120 minutes using only a consumer laptop and no human intervention.

Turns pretrained video diffusion models into high-efficiency codecs, achieving high-quality reconstruction at extremely low bitrates (below 0.002 bpp) without retraining.

Achieves 6x compute reduction in Multimodal LLMs while actually improving accuracy by 2%.

Reconstructs entire Spiking Neural Networks into a single neuron via temporal multiplexing.

Introduces a stable backpropagation-free training framework for physical and photonic neural networks.

Achieves state-of-the-art vision-language pretraining using 300x less data than leading methods.

Enables 10x faster robot trajectory generation by distilling diffusion models into movement primitives.

Speeds up RL-based reasoning training by 1.7x using an online quality head to prune failing rollouts mid-generation.

Sparton is a specialized Triton kernel that solves the massive memory bottleneck of Learned Sparse Retrieval (LSR) models like Splade.

A fully differentiable agent-based traffic simulator enables calibration and control of million-vehicle networks 173x faster than real-time.

GIFT is a training-free frame selection framework that uses 'Directed Diversity' to boost Video-LLM performance by up to 12.5%.