[2603.17812] ChopGrad: Pixel-Wise Losses for Latent Video Diffusion via Truncated Backpropagation
About this article
Abstract page for arXiv paper 2603.17812: ChopGrad: Pixel-Wise Losses for Latent Video Diffusion via Truncated Backpropagation
Computer Science > Computer Vision and Pattern Recognition arXiv:2603.17812 (cs) [Submitted on 18 Mar 2026 (v1), last revised 7 Apr 2026 (this version, v2)] Title:ChopGrad: Pixel-Wise Losses for Latent Video Diffusion via Truncated Backpropagation Authors:Dmitriy Rivkin, Parker Ewen, Lili Gao, Julian Ost, Stefanie Walz, Rasika Kangutkar, Mario Bijelic, Felix Heide View a PDF of the paper titled ChopGrad: Pixel-Wise Losses for Latent Video Diffusion via Truncated Backpropagation, by Dmitriy Rivkin and 7 other authors View PDF HTML (experimental) Abstract:Recent video diffusion models achieve high-quality generation through recurrent frame processing where each frame generation depends on previous frames. However, this recurrent mechanism means that training such models in the pixel domain incurs prohibitive memory costs, as activations accumulate across the entire video sequence. This fundamental limitation also makes fine-tuning these models with pixel-wise losses computationally intractable for long or high-resolution videos. This paper introduces ChopGrad, a truncated backpropagation scheme for video decoding, limiting gradient computation to local frame windows while maintaining global consistency. We provide a theoretical analysis of this approximation and show that it enables efficient fine-tuning with frame-wise losses. ChopGrad reduces training memory from scaling linearly with the number of video frames (full backpropagation) to constant memory, and compares favora...