[2602.22740] AMLRIS: Alignment-aware Masked Learning for Referring Image Segmentation
Summary
The paper presents AMLRIS, a novel training strategy for Referring Image Segmentation (RIS) that enhances object segmentation through alignment-aware masked learning, achieving state-of-the-art results on RefCOCO datasets.
Why It Matters
This research addresses the challenge of accurately segmenting objects in images based on natural language descriptions, a critical task in computer vision. By improving alignment between visual and linguistic data, it enhances the robustness and reliability of RIS systems, which have applications in various fields such as robotics and human-computer interaction.
Key Takeaways
- AMLRIS introduces a new training strategy for Referring Image Segmentation.
- The method focuses on pixel-level vision-language alignment to improve segmentation accuracy.
- It filters out poorly aligned regions during optimization for better performance.
- Achieves state-of-the-art results on RefCOCO datasets.
- Enhances robustness to diverse descriptions and scenarios.
Computer Science > Computer Vision and Pattern Recognition arXiv:2602.22740 (cs) [Submitted on 26 Feb 2026] Title:AMLRIS: Alignment-aware Masked Learning for Referring Image Segmentation Authors:Tongfei Chen, Shuo Yang, Yuguang Yang, Linlin Yang, Runtang Guo, Changbai Li, He Long, Chunyu Xie, Dawei Leng, Baochang Zhang View a PDF of the paper titled AMLRIS: Alignment-aware Masked Learning for Referring Image Segmentation, by Tongfei Chen and 9 other authors View PDF HTML (experimental) Abstract:Referring Image Segmentation (RIS) aims to segment an object in an image identified by a natural language expression. The paper introduces Alignment-Aware Masked Learning (AML), a training strategy to enhance RIS by explicitly estimating pixel-level vision-language alignment, filtering out poorly aligned regions during optimization, and focusing on trustworthy cues. This approach results in state-of-the-art performance on RefCOCO datasets and also enhances robustness to diverse descriptions and scenarios Comments: Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI) Cite as: arXiv:2602.22740 [cs.CV] (or arXiv:2602.22740v1 [cs.CV] for this version) https://doi.org/10.48550/arXiv.2602.22740 Focus to learn more arXiv-issued DOI via DataCite (pending registration) Submission history From: Tongfei Chen [view email] [v1] Thu, 26 Feb 2026 08:29:04 UTC (11,417 KB) Full-text links: Access Paper: View a PDF of the paper titled AMLRIS: Alignment-aware ...