[2511.19365] DeCo: Frequency-Decoupled Pixel Diffusion for End-to-End Image Generation
About this article
Abstract page for arXiv paper 2511.19365: DeCo: Frequency-Decoupled Pixel Diffusion for End-to-End Image Generation
Computer Science > Computer Vision and Pattern Recognition arXiv:2511.19365 (cs) [Submitted on 24 Nov 2025 (v1), last revised 8 Apr 2026 (this version, v2)] Title:DeCo: Frequency-Decoupled Pixel Diffusion for End-to-End Image Generation Authors:Zehong Ma, Longhui Wei, Shuai Wang, Shiliang Zhang, Qi Tian View a PDF of the paper titled DeCo: Frequency-Decoupled Pixel Diffusion for End-to-End Image Generation, by Zehong Ma and 4 other authors View PDF HTML (experimental) Abstract:Pixel diffusion aims to generate images directly in pixel space in an end-to-end fashion. This approach avoids the limitations of VAE in the two-stage latent diffusion, offering higher model capacity. Existing pixel diffusion models suffer from slow training and inference, as they usually model both high-frequency signals and low-frequency semantics within a single diffusion transformer (DiT). To pursue a more efficient pixel diffusion paradigm, we propose the frequency-DeCoupled pixel diffusion framework. With the intuition to decouple the generation of high and low frequency components, we leverage a lightweight pixel decoder to generate high-frequency details conditioned on semantic guidance from the DiT. This thus frees the DiT to specialize in modeling low-frequency semantics. In addition, we introduce a frequency-aware flow-matching loss that emphasizes visually salient frequencies while suppressing insignificant ones. Extensive experiments show that DeCo achieves superior performance among pix...