Dataset Concentration (DsCo) achieves nearly lossless dataset reduction by aligning distributions via diffusion models, cutting storage and training costs by half.
March 31, 2026
Original Paper
Beyond Dataset Distillation: Lossless Dataset Concentration via Diffusion-Assisted Distribution Alignment
arXiv · 2603.27987
The Takeaway
Unlike traditional dataset distillation which often loses performance, DsCo uses a 'doping' strategy and diffusion-based noise optimization to maintain full-dataset accuracy with 50% fewer samples. It effectively overcomes the efficiency limits that have plagued surrogate dataset research for years.
From the abstract
The high cost and accessibility problem associated with large datasets hinder the development of large-scale visual recognition systems. Dataset Distillation addresses these problems by synthesizing compact surrogate datasets for efficient training, storage, transfer, and privacy preservation. The existing state-of-the-art diffusion-based dataset distillation methods face three issues: lack of theoretical justification, poor efficiency in scaling to high data volumes, and failure in data-free sc