[2505.15263] gen2seg: Generative Models Enable Generalizable Instance Segmentation
About this article
Abstract page for arXiv paper 2505.15263: gen2seg: Generative Models Enable Generalizable Instance Segmentation
Computer Science > Computer Vision and Pattern Recognition arXiv:2505.15263 (cs) [Submitted on 21 May 2025 (v1), last revised 2 Apr 2026 (this version, v3)] Title:gen2seg: Generative Models Enable Generalizable Instance Segmentation Authors:Om Khangaonkar, Hamed Pirsiavash View a PDF of the paper titled gen2seg: Generative Models Enable Generalizable Instance Segmentation, by Om Khangaonkar and 1 other authors View PDF HTML (experimental) Abstract:By pretraining to synthesize coherent images from perturbed inputs, generative models inherently learn to understand object boundaries and scene compositions. How can we repurpose these generative representations for general-purpose perceptual organization? We finetune Stable Diffusion and MAE (encoder+decoder) for category-agnostic instance segmentation using our instance coloring loss exclusively on a narrow set of object types (indoor furnishings and cars). Surprisingly, our models exhibit strong zero-shot generalization, accurately segmenting objects of types and styles unseen in finetuning. This holds even for MAE, which is pretrained on unlabeled ImageNet-1K only. When evaluated on unseen object types and styles, our best-performing models closely approach the heavily supervised SAM, and outperform it when segmenting fine structures and ambiguous boundaries. In contrast, existing promptable segmentation architectures or discriminatively pretrained models fail to generalize. This suggests that generative models learn an inhe...