AI & ML Paradigm Shift

VAE tokenizers in Latent Diffusion Models create 'overly compact' manifolds that cause variance collapse, leading to unstable generative sampling.

March 24, 2026

Original Paper

Taming Sampling Perturbations with Variance Expansion Loss for Latent Diffusion Models

Qifan Li, Xingyu Zhou, Jinhua Zhang, Weiyi You, Shuhang Gu

arXiv · 2603.21085

The Takeaway

The paper introduces a Variance Expansion loss that explicitly counteracts this collapse, making the latent space robust to the stochastic perturbations inherent in diffusion. This is a fundamental fix for the 'unstable' generation often seen in LDMs, prioritizing latent robustness over simple reconstruction fidelity.

From the abstract

Latent diffusion models have emerged as the dominant framework for high-fidelity and efficient image generation, owing to their ability to learn diffusion processes in compact latent spaces. However, while previous research has focused primarily on reconstruction accuracy and semantic alignment of the latent space, we observe that another critical factor, robustness to sampling perturbations, also plays a crucial role in determining generation quality. Through empirical and theoretical analyses,