Aligns diffusion models with human preferences using only 100 samples, outperforming SOTA methods that use thousands.
March 20, 2026
Original Paper
CRAFT: Aligning Diffusion Models with Fine-Tuning Is Easier Than You Think
arXiv · 2603.18991
The Takeaway
It reduces the data requirement for high-quality preference alignment by over 10x while converging up to 220x faster than DPO-style methods. For practitioners, this makes bespoke image generation alignment feasible with minimal manual labeling or curation.
From the abstract
Aligning Diffusion models has achieved remarkable breakthroughs in generating high-quality, human preference-aligned images. Existing techniques, such as supervised fine-tuning (SFT) and DPO-style preference optimization, have become principled tools for fine-tuning diffusion models. However, SFT relies on high-quality images that are costly to obtain, while DPO-style methods depend on large-scale preference datasets, which are often inconsistent in quality. Beyond data dependency, these methods