VAE tokenizers in Latent Diffusion Models create 'overly compact' manifolds that cause variance collapse, leading to unstable generative sampling.
March 24, 2026
Original Paper
Taming Sampling Perturbations with Variance Expansion Loss for Latent Diffusion Models
arXiv · 2603.21085
The Takeaway
The paper introduces a Variance Expansion loss that explicitly counteracts this collapse, making the latent space robust to the stochastic perturbations inherent in diffusion. This is a fundamental fix for the 'unstable' generation often seen in LDMs, prioritizing latent robustness over simple reconstruction fidelity.
From the abstract
Latent diffusion models have emerged as the dominant framework for high-fidelity and efficient image generation, owing to their ability to learn diffusion processes in compact latent spaces. However, while previous research has focused primarily on reconstruction accuracy and semantic alignment of the latent space, we observe that another critical factor, robustness to sampling perturbations, also plays a crucial role in determining generation quality. Through empirical and theoretical analyses,