AI & ML Efficiency Breakthrough

Accelerates diffusion-based image decoders by an order of magnitude using multi-scale sampling and one-step distillation.

March 23, 2026

Original Paper

Accelerating Diffusion Decoders via Multi-Scale Sampling and One-Step Distillation

Chuhan Wang, Hao Chen

arXiv · 2603.19570

The Takeaway

High-fidelity visual tokenizers increasingly use slow diffusion decoders. This framework enables real-time high-fidelity reconstruction, which is a critical enabler for visual generative models that need to process or output high-resolution pixels efficiently.

From the abstract

Image tokenization plays a central role in modern generative modeling by mapping visual inputs into compact representations that serve as an intermediate signal between pixels and generative models. Diffusion-based decoders have recently been adopted in image tokenization to reconstruct images from latent representations with high perceptual fidelity. In contrast to diffusion models used for downstream generation, these decoders are dedicated to faithful reconstruction rather than content genera