AI & ML Efficiency Breakthrough

Parallelizes diffusion model sampling across multiple devices using a draft-and-refine process for up to 3.7x speedups.

March 30, 2026

Original Paper

DRiffusion: Draft-and-Refine Process Parallelizes Diffusion Models with Ease

Runsheng Bai, Chengyu Zhang, Yangdong Deng

arXiv · 2603.25872

The Takeaway

Most diffusion speedups rely on distillation or fewer steps; this framework allows practitioners to use existing models and simply add more compute/devices to reduce latency in interactive applications without losing quality.

From the abstract

Diffusion models have achieved remarkable success in generating high-fidelity content but suffer from slow, iterative sampling, resulting in high latency that limits their use in interactive applications. We introduce DRiffusion, a parallel sampling framework that parallelizes diffusion inference through a draft-and-refine process. DRiffusion employs skip transitions to generate multiple draft states for future timesteps and computes their corresponding noises in parallel, which are then used in