[2505.15263] gen2seg: Generative Models Enable Generalizable Instance

[2505.15263] gen2seg: Generative Models Enable Generalizable Instance Segmentation

arXiv - Machine Learning April 06, 2026 3 min read

About this article

Abstract page for arXiv paper 2505.15263: gen2seg: Generative Models Enable Generalizable Instance Segmentation

Computer Science > Computer Vision and Pattern Recognition arXiv:2505.15263 (cs) [Submitted on 21 May 2025 (v1), last revised 2 Apr 2026 (this version, v3)] Title:gen2seg: Generative Models Enable Generalizable Instance Segmentation Authors:Om Khangaonkar, Hamed Pirsiavash View a PDF of the paper titled gen2seg: Generative Models Enable Generalizable Instance Segmentation, by Om Khangaonkar and 1 other authors View PDF HTML (experimental) Abstract:By pretraining to synthesize coherent images from perturbed inputs, generative models inherently learn to understand object boundaries and scene compositions. How can we repurpose these generative representations for general-purpose perceptual organization? We finetune Stable Diffusion and MAE (encoder+decoder) for category-agnostic instance segmentation using our instance coloring loss exclusively on a narrow set of object types (indoor furnishings and cars). Surprisingly, our models exhibit strong zero-shot generalization, accurately segmenting objects of types and styles unseen in finetuning. This holds even for MAE, which is pretrained on unlabeled ImageNet-1K only. When evaluated on unseen object types and styles, our best-performing models closely approach the heavily supervised SAM, and outperform it when segmenting fine structures and ambiguous boundaries. In contrast, existing promptable segmentation architectures or discriminatively pretrained models fail to generalize. This suggests that generative models learn an inhe...

Originally published on April 06, 2026. Curated by AI News.

Ai Startups

Top 10 AI certifications and courses for 2026

This article reviews the top 10 AI certifications and courses for 2026, highlighting their significance in a rapidly evolving field and t...

AI Events · 15 min · 14 minutes ago

Llms

[2604.01989] Attention at Rest Stays at Rest: Breaking Visual Inertia for Cognitive Hallucination Mitigation

Abstract page for arXiv paper 2604.01989: Attention at Rest Stays at Rest: Breaking Visual Inertia for Cognitive Hallucination Mitigation

arXiv - AI · 4 min · about 1 hour ago

Machine Learning

[2604.01447] Better Rigs, Not Bigger Networks: A Body Model Ablation for Gaussian Avatars

Abstract page for arXiv paper 2604.01447: Better Rigs, Not Bigger Networks: A Body Model Ablation for Gaussian Avatars

arXiv - AI · 3 min · about 1 hour ago

Llms

[2603.24326] Boosting Document Parsing Efficiency and Performance with Coarse-to-Fine Visual Processing

Abstract page for arXiv paper 2603.24326: Boosting Document Parsing Efficiency and Performance with Coarse-to-Fine Visual Processing