[2603.24275] Language-Assisted Image Clustering Guided by Discriminative Relational Signals and Adaptive Semantic Centers
About this article
Abstract page for arXiv paper 2603.24275: Language-Assisted Image Clustering Guided by Discriminative Relational Signals and Adaptive Semantic Centers
Computer Science > Machine Learning arXiv:2603.24275 (cs) [Submitted on 25 Mar 2026] Title:Language-Assisted Image Clustering Guided by Discriminative Relational Signals and Adaptive Semantic Centers Authors:Jun Ma, Xu Zhang, Zhengxing Jiao, Yaxin Hou, Hui Liu, Junhui Hou, Yuheng Jia View a PDF of the paper titled Language-Assisted Image Clustering Guided by Discriminative Relational Signals and Adaptive Semantic Centers, by Jun Ma and 5 other authors View PDF HTML (experimental) Abstract:Language-Assisted Image Clustering (LAIC) augments the input images with additional texts with the help of vision-language models (VLMs) to promote clustering performance. Despite recent progress, existing LAIC methods often overlook two issues: (i) textual features constructed for each image are highly similar, leading to weak inter-class discriminability; (ii) the clustering step is restricted to pre-built image-text alignments, limiting the potential for better utilization of the text modality. To address these issues, we propose a new LAIC framework with two complementary components. First, we exploit cross-modal relations to produce more discriminative self-supervision signals for clustering, as it compatible with most VLMs training mechanisms. Second, we learn category-wise continuous semantic centers via prompt learning to produce the final clustering assignments. Extensive experiments on eight benchmark datasets demonstrate that our method achieves an average improvement of 2.6% o...