[2602.14464] CoCoDiff: Correspondence-Consistent Diffusion Model for Fine-grained Style Transfer
Summary
The paper presents CoCoDiff, a novel framework for fine-grained style transfer in images, emphasizing semantic correspondence and achieving high visual quality without additional training.
Why It Matters
CoCoDiff addresses a critical challenge in computer vision by enhancing style transfer techniques to maintain semantic consistency at a pixel level. This advancement is significant for applications in art generation, image editing, and augmented reality, where preserving object integrity is essential.
Key Takeaways
- CoCoDiff utilizes pretrained latent diffusion models for style transfer.
- The framework introduces a pixel-wise semantic correspondence module for better alignment.
- Cycle-consistency is enforced to maintain structural and perceptual integrity.
- It achieves state-of-the-art results without requiring additional training.
- The method is cost-effective and accessible for various applications.
Computer Science > Computer Vision and Pattern Recognition arXiv:2602.14464 (cs) [Submitted on 16 Feb 2026] Title:CoCoDiff: Correspondence-Consistent Diffusion Model for Fine-grained Style Transfer Authors:Wenbo Nie, Zixiang Li, Renshuai Tao, Bin Wu, Yunchao Wei, Yao Zhao View a PDF of the paper titled CoCoDiff: Correspondence-Consistent Diffusion Model for Fine-grained Style Transfer, by Wenbo Nie and 5 other authors View PDF HTML (experimental) Abstract:Transferring visual style between images while preserving semantic correspondence between similar objects remains a central challenge in computer vision. While existing methods have made great strides, most of them operate at global level but overlook region-wise and even pixel-wise semantic correspondence. To address this, we propose CoCoDiff, a novel training-free and low-cost style transfer framework that leverages pretrained latent diffusion models to achieve fine-grained, semantically consistent stylization. We identify that correspondence cues within generative diffusion models are under-explored and that content consistency across semantically matched regions is often neglected. CoCoDiff introduces a pixel-wise semantic correspondence module that mines intermediate diffusion features to construct a dense alignment map between content and style images. Furthermore, a cycle-consistency module then enforces structural and perceptual alignment across iterations, yielding object and region level stylization that preserv...