AI & ML Efficiency Breakthrough

Dataset Concentration (DsCo) achieves nearly lossless dataset reduction by aligning distributions via diffusion models, cutting storage and training costs by half.

March 31, 2026

Original Paper

Beyond Dataset Distillation: Lossless Dataset Concentration via Diffusion-Assisted Distribution Alignment

Tongfei Liu, Yufan Liu, Bing Li, Weiming Hu

arXiv · 2603.27987

The Takeaway

Unlike traditional dataset distillation which often loses performance, DsCo uses a 'doping' strategy and diffusion-based noise optimization to maintain full-dataset accuracy with 50% fewer samples. It effectively overcomes the efficiency limits that have plagued surrogate dataset research for years.

From the abstract

The high cost and accessibility problem associated with large datasets hinder the development of large-scale visual recognition systems. Dataset Distillation addresses these problems by synthesizing compact surrogate datasets for efficient training, storage, transfer, and privacy preservation. The existing state-of-the-art diffusion-based dataset distillation methods face three issues: lack of theoretical justification, poor efficiency in scaling to high data volumes, and failure in data-free sc