[2603.04158] GarmentPile++: Affordance-Driven Cluttered Garments Retrieval with Vision-Language Reasoning
About this article
Abstract page for arXiv paper 2603.04158: GarmentPile++: Affordance-Driven Cluttered Garments Retrieval with Vision-Language Reasoning
Computer Science > Robotics arXiv:2603.04158 (cs) [Submitted on 4 Mar 2026] Title:GarmentPile++: Affordance-Driven Cluttered Garments Retrieval with Vision-Language Reasoning Authors:Mingleyang Li, Yuran Wang, Yue Chen, Tianxing Chen, Jiaqi Liang, Zishun Shen, Haoran Lu, Ruihai Wu, Hao Dong View a PDF of the paper titled GarmentPile++: Affordance-Driven Cluttered Garments Retrieval with Vision-Language Reasoning, by Mingleyang Li and 8 other authors View PDF HTML (experimental) Abstract:Garment manipulation has attracted increasing attention due to its critical role in home-assistant robotics. However, the majority of existing garment manipulation works assume an initial state consisting of only one garment, while piled garments are far more common in real-world settings. To bridge this gap, we propose a novel garment retrieval pipeline that can not only follow language instruction to execute safe and clean retrieval but also guarantee exactly one garment is retrieved per attempt, establishing a robust foundation for the execution of downstream tasks (e.g., folding, hanging, wearing). Our pipeline seamlessly integrates vision-language reasoning with visual affordance perception, fully leveraging the high-level reasoning and planning capabilities of VLMs alongside the generalization power of visual affordance for low-level actions. To enhance the VLM's comprehensive awareness of each garment's state within a garment pile, we employ visual segmentation model (SAM2) to execut...