Provides a robust method for distilling discrete diffusion models that maintains quality and diversity even with very few sampling steps.
March 23, 2026
Original Paper
Beyond Single Tokens: Distilling Discrete Diffusion Models via Discrete MMD
arXiv · 2603.20155
The Takeaway
Distilling discrete models (common in text) has been significantly harder than continuous ones. This technique allows for fast, high-quality generation that can even outperform the original teacher model, making discrete diffusion much more viable for production.
From the abstract
It is currently difficult to distill discrete diffusion models. In contrast, continuous diffusion literature has many distillation approaches methods that can reduce sampling steps to a handful.Our method, Discrete Moment Matching Distillation (D-MMD), leverages ideas that have been highly successful in the continuous domain. Whereas previous discrete distillation methods collapse, D-MMD maintains high quality and diversity (given sufficient sampling steps). This is demonstrated on both text and