Truncated backpropagation for video decoding reduces the memory cost of fine-tuning video diffusion models from linear to constant.
March 19, 2026
Original Paper
ChopGrad: Pixel-Wise Losses for Latent Video Diffusion via Truncated Backpropagation
arXiv · 2603.17812
The Takeaway
The 'ChopGrad' scheme allows for pixel-wise loss fine-tuning on long or high-resolution videos that were previously computationally intractable. This enables practitioners to apply high-fidelity losses (like super-resolution or inpainting) to video sequences on standard hardware.
From the abstract
Recent video diffusion models achieve high-quality generation through recurrent frame processing where each frame generation depends on previous frames. However, this recurrent mechanism means that training such models in the pixel domain incurs prohibitive memory costs, as activations accumulate across the entire video sequence. This fundamental limitation also makes fine-tuning these models with pixel-wise losses computationally intractable for long or high-resolution videos. This paper introd