[2411.15087] Phrase-Instance Alignment for Generalized Referring Segmentation
About this article
Abstract page for arXiv paper 2411.15087: Phrase-Instance Alignment for Generalized Referring Segmentation
Computer Science > Computer Vision and Pattern Recognition arXiv:2411.15087 (cs) [Submitted on 22 Nov 2024 (v1), last revised 24 Mar 2026 (this version, v2)] Title:Phrase-Instance Alignment for Generalized Referring Segmentation Authors:E-Ro Nguyen, Hieu Le, Dimitris Samaras, Michael S. Ryoo View a PDF of the paper titled Phrase-Instance Alignment for Generalized Referring Segmentation, by E-Ro Nguyen and 3 other authors View PDF HTML (experimental) Abstract:Generalized Referring expressions can describe one object, several related objects, or none at all. Existing generalized referring segmentation (GRES) models treat all cases alike, predicting a single binary mask and ignoring how linguistic phrases correspond to distinct visual instances. To this end, we reformulate GRES as an instance-level reasoning problem, where the model first predicts multiple instance-aware object queries conditioned on the referring expression, then aligns each with its most relevant phrase. This alignment is enforced by a Phrase-Object Alignment (POA) loss that builds fine-grained correspondence between linguistic phrases and visual instances. Given these aligned object instance queries and their learned relevance scores, the final segmentation and the no-target case are both inferred through a unified relevance-weighted aggregation mechanism. This instance-aware formulation enables explicit phrase-instance grounding, interpretable reasoning, and robust handling of complex or null expressions....