[2602.23203] ColoDiff: Integrating Dynamic Consistency With Content Awareness for Colonoscopy Video Generation
Summary
ColoDiff introduces a novel framework for generating colonoscopy videos that ensures dynamic consistency and content awareness, addressing data scarcity in clinical settings.
Why It Matters
This research is significant as it tackles the challenges of generating high-quality colonoscopy videos, which are crucial for diagnosing intestinal diseases. By improving video generation through advanced techniques, it has the potential to enhance clinical analysis and patient outcomes, especially in data-scarce environments.
Key Takeaways
- ColoDiff utilizes a diffusion-based framework for video generation.
- The TimeStream module enables intricate dynamic modeling of colonoscopy videos.
- Content-Aware module allows precise control over clinical attributes.
- Non-Markovian sampling strategy significantly reduces generation time.
- ColoDiff shows promise in complementing real videos for clinical analysis.
Computer Science > Computer Vision and Pattern Recognition arXiv:2602.23203 (cs) [Submitted on 26 Feb 2026] Title:ColoDiff: Integrating Dynamic Consistency With Content Awareness for Colonoscopy Video Generation Authors:Junhu Fu, Shuyu Liang, Wutong Li, Chen Ma, Peng Huang, Kehao Wang, Ke Chen, Shengli Lin, Pinghong Zhou, Zeju Li, Yuanyuan Wang, Yi Guo View a PDF of the paper titled ColoDiff: Integrating Dynamic Consistency With Content Awareness for Colonoscopy Video Generation, by Junhu Fu and 11 other authors View PDF HTML (experimental) Abstract:Colonoscopy video generation delivers dynamic, information-rich data critical for diagnosing intestinal diseases, particularly in data-scarce scenarios. High-quality video generation demands temporal consistency and precise control over clinical attributes, but faces challenges from irregular intestinal structures, diverse disease representations, and various imaging modalities. To this end, we propose ColoDiff, a diffusion-based framework that generates dynamic-consistent and content-aware colonoscopy videos, aiming to alleviate data shortage and assist clinical analysis. At the inter-frame level, our TimeStream module decouples temporal dependency from video sequences through a cross-frame tokenization mechanism, enabling intricate dynamic modeling despite irregular intestinal structures. At the intra-frame level, our Content-Aware module incorporates noise-injected embeddings and learnable prototypes to realize precise contr...