[2312.17505] Catch Me If You Can Describe Me: Open-Vocabulary Camouflaged Instance Segmentation with Diffusion
About this article
Abstract page for arXiv paper 2312.17505: Catch Me If You Can Describe Me: Open-Vocabulary Camouflaged Instance Segmentation with Diffusion
Computer Science > Computer Vision and Pattern Recognition arXiv:2312.17505 (cs) [Submitted on 29 Dec 2023 (v1), last revised 3 Mar 2026 (this version, v2)] Title:Catch Me If You Can Describe Me: Open-Vocabulary Camouflaged Instance Segmentation with Diffusion Authors:Tuan-Anh Vu, Duc Thanh Nguyen, Qing Guo, Nhat Chung, Binh-Son Hua, Ivor W. Tsang, Sai-Kit Yeung View a PDF of the paper titled Catch Me If You Can Describe Me: Open-Vocabulary Camouflaged Instance Segmentation with Diffusion, by Tuan-Anh Vu and 6 other authors View PDF HTML (experimental) Abstract:Text-to-image diffusion techniques have shown exceptional capabilities in producing high-quality, dense visual predictions from open-vocabulary text. This indicates a strong correlation between visual and textual domains in open concepts and that diffusion-based text-to-image models can capture rich and diverse information for computer vision tasks. However, we found that those advantages do not hold for learning of features of camouflaged individuals because of the significant blending between their visual boundaries and their surroundings. In this paper, while leveraging the benefits of diffusion-based techniques and text-image models in open-vocabulary settings, we aim to address a challenging problem in computer vision: open-vocabulary camouflaged instance segmentation (OVCIS). Specifically, we propose a method built upon state-of-the-art diffusion empowered by open-vocabulary to learn multi-scale textual-visual...