[2502.00618] DesCLIP: Robust Continual Learning via General Attribute Descriptions for VLM-Based Visual Recognition
About this article
Abstract page for arXiv paper 2502.00618: DesCLIP: Robust Continual Learning via General Attribute Descriptions for VLM-Based Visual Recognition
Computer Science > Computer Vision and Pattern Recognition arXiv:2502.00618 (cs) [Submitted on 2 Feb 2025 (v1), last revised 21 Mar 2026 (this version, v3)] Title:DesCLIP: Robust Continual Learning via General Attribute Descriptions for VLM-Based Visual Recognition Authors:Chiyuan He, Zihuan Qiu, Fanman Meng, Linfeng Xu, Qingbo Wu, Hongliang Li View a PDF of the paper titled DesCLIP: Robust Continual Learning via General Attribute Descriptions for VLM-Based Visual Recognition, by Chiyuan He and 5 other authors View PDF HTML (experimental) Abstract:Continual learning of vision-language models (VLMs) focuses on leveraging cross-modal pretrained knowledge to incrementally adapt to expanding downstream tasks and datasets, while tackling the challenge of knowledge forgetting. Existing research often focuses on connecting visual features with specific class text in downstream tasks, overlooking the latent relationships between general and specialized knowledge. Our findings reveal that forcing models to optimize inappropriate visual-text matches exacerbates forgetting of VLM's recognition ability. To tackle this issue, we propose DesCLIP, which leverages general attribute (GA) descriptions to guide the understanding of specific class objects, enabling VLMs to establish robust vision-GA-class trilateral associations rather than relying solely on vision-class connections. Specifically, we introduce a language assistant to generate concrete GA description candidates via proper requ...