Llms Machine Learning Ai Infrastructure Computer Vision Nlp

[2511.13494] Language-Guided Invariance Probing of Vision-Language Models

arXiv - AI February 16, 2026 3 min read Article

Summary

This article introduces Language-Guided Invariance Probing (LGIP), a benchmark for evaluating the robustness of vision-language models (VLMs) against linguistic perturbations.

Why It Matters

Understanding how VLMs respond to linguistic variations is crucial for improving their reliability in real-world applications. LGIP offers a new diagnostic tool to assess linguistic robustness, which is often overlooked by traditional accuracy metrics.

Key Takeaways

LGIP measures invariance to paraphrases and sensitivity to semantic changes in image-text matching.
The benchmark reveals performance disparities among various VLMs, highlighting strengths and weaknesses.
EVA02-CLIP and large OpenCLIP variants demonstrate favorable invariance-sensitivity balance.
Standard retrieval metrics may not capture linguistic robustness, necessitating new evaluation methods.
The findings can guide future research in enhancing VLMs' linguistic capabilities.

Computer Science > Computer Vision and Pattern Recognition arXiv:2511.13494 (cs) [Submitted on 17 Nov 2025] Title:Language-Guided Invariance Probing of Vision-Language Models Authors:Jae Joong Lee View a PDF of the paper titled Language-Guided Invariance Probing of Vision-Language Models, by Jae Joong Lee View PDF HTML (experimental) Abstract:Recent vision-language models (VLMs) such as CLIP, OpenCLIP, EVA02-CLIP and SigLIP achieve strong zero-shot performance, but it is unclear how reliably they respond to controlled linguistic perturbations. We introduce Language-Guided Invariance Probing (LGIP), a benchmark that measures (i) invariance to meaning-preserving paraphrases and (ii) sensitivity to meaning-changing semantic flips in image-text matching. Using 40k MS COCO images with five human captions each, we automatically generate paraphrases and rule-based flips that alter object category, color or count, and summarize model behavior with an invariance error, a semantic sensitivity gap and a positive-rate statistic. Across nine VLMs, EVA02-CLIP and large OpenCLIP variants lie on a favorable invariance-sensitivity frontier, combining low paraphrase-induced variance with consistently higher scores for original captions than for their flipped counterparts. In contrast, SigLIP and SigLIP2 show much larger invariance error and often prefer flipped captions to the human descriptions, especially for object and color edits. These failures are largely invisible to standard retriev...

Read Original Article

Llms

Is the Mirage Effect a bug, or is it Geometric Reconstruction in action? A framework for why VLMs perform better "hallucinating" than guessing, and what that may tell us about what's really inside these models

Last week, a team from Stanford and UCSF (Asadi, O'Sullivan, Fei-Fei Li, Euan Ashley et al.) dropped two companion papers. The first, MAR...

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

Llms

Paper Finds That Leading AI Chatbots Like ChatGPT and Claude Remain Incredibly Sycophantic, Resulting in Twisted Effects on Users

https://futurism.com/artificial-intelligence/paper-ai-chatbots-chatgpt-claude-sycophantic Your AI chatbot isn’t neutral. Trust its advice...

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

Llms

Claude Code leak exposes a Tamagotchi-style ‘pet’ and an always-on agent | The Verge

Anthropic says “human error” resulted in a leak that exposed Claude Code’s source code. The leaked code, which has since been copied to G...

The Verge - AI · 4 min · about 2 hours ago

Llms

You can now use ChatGPT with Apple’s CarPlay | The Verge

ChatGPT is now accessible from your CarPlay dashboard if you have iOS 26.4 or newer and the latest version of the ChatGPT app.

The Verge - AI · 3 min · about 3 hours ago

[2511.13494] Language-Guided Invariance Probing of Vision-Language Models

Summary

Why It Matters

Key Takeaways

Related Articles

Is the Mirage Effect a bug, or is it Geometric Reconstruction in action? A framework for why VLMs perform better "hallucinating" than guessing, and what that may tell us about what's really inside these models

Paper Finds That Leading AI Chatbots Like ChatGPT and Claude Remain Incredibly Sycophantic, Resulting in Twisted Effects on Users

Claude Code leak exposes a Tamagotchi-style ‘pet’ and an always-on agent | The Verge

You can now use ChatGPT with Apple’s CarPlay | The Verge

No comments

Stay updated with AI News