Achieves high-fidelity diffusion generation in just 3 steps by distilling layer-wise time embeddings from reference trajectories.
March 25, 2026
Original Paper
Three Creates All: You Only Sample 3 Steps
arXiv · 2603.22375
The Takeaway
Unlike previous distillation methods that require heavy fine-tuning, this 'plug-and-play' approach freezes the backbone and only trains a tiny fraction of parameters. It narrows the gap between slow ODE solvers and fast distillation, enabling 10x faster inference on existing pre-trained models.
From the abstract
Diffusion models deliver high-fidelity generation but remain slow at inference time due to many sequential network evaluations. We find that standard timestep conditioning becomes a key bottleneck for few-step sampling. Motivated by layer-dependent denoising dynamics, we propose Multi-layer Time Embedding Optimization (MTEO), which freeze the pretrained diffusion backbone and distill a small set of step-wise, layer-wise time embeddings from reference trajectories. MTEO is plug-and-play with exis