[2604.03980] Gram-Anchored Prompt Learning for Vision-Language Models via Second-Order Statistics
About this article
Abstract page for arXiv paper 2604.03980: Gram-Anchored Prompt Learning for Vision-Language Models via Second-Order Statistics
Computer Science > Computer Vision and Pattern Recognition arXiv:2604.03980 (cs) [Submitted on 5 Apr 2026] Title:Gram-Anchored Prompt Learning for Vision-Language Models via Second-Order Statistics Authors:Minglei Chen, Weilong Wang, Jiang Duan, Ye Deng View a PDF of the paper titled Gram-Anchored Prompt Learning for Vision-Language Models via Second-Order Statistics, by Minglei Chen and 3 other authors View PDF HTML (experimental) Abstract:Parameter-efficient prompt learning has become the de facto standard for adapting Vision-Language Models (VLMs) to downstream tasks. Existing approaches predominantly focus on aligning text prompts with first-order visual features (i.e., spatial feature maps). While effective for fine-grained semantic discrimination, we argue that relying solely on first-order information is insufficient for robust adaptation, as these spatially entangled features are highly susceptible to domain shifts and local noise. In this work, we propose \textbf{Gram-Anchored Prompt Learning (GAPL)} for Vision-Language Models via Second-Order Statistics, a framework that synergizes local semantic alignment with global structural consistency. Methodologically, we introduce an additional second-order statistical stream via \textbf{Gram matrices} that augments the standard first-order spatial interaction. By anchoring prompts to these second-order priors, our approach enables language representations to dynamically adapt to statistical distribution shifts across div...