[2602.12506] On Robustness and Chain-of-Thought Consistency of RL-Finetuned VLMs

[2602.12506] On Robustness and Chain-of-Thought Consistency of RL-Finetuned VLMs

arXiv - Machine Learning 4 min read Article

Summary

This article examines the robustness and chain-of-thought consistency of reinforcement learning (RL) fine-tuned vision language models (VLMs), highlighting their vulnerabilities and the impact of training methods on model reliability.

Why It Matters

As AI models become integral in reasoning tasks, understanding their limitations is crucial for developing more reliable systems. This research sheds light on the trade-offs between accuracy and robustness, emphasizing the need for improved training protocols that ensure both performance and faithfulness in model outputs.

Key Takeaways

  • RL fine-tuning enhances VLMs but introduces vulnerabilities.
  • Textual perturbations significantly affect model robustness and confidence.
  • Accuracy improvements can lead to a trade-off with reliability and faithfulness.
  • Adversarial augmentation alone does not guarantee robustness.
  • Faithfulness-aware rewards can help align reasoning with outputs.

Computer Science > Machine Learning arXiv:2602.12506 (cs) [Submitted on 13 Feb 2026] Title:On Robustness and Chain-of-Thought Consistency of RL-Finetuned VLMs Authors:Rosie Zhao, Anshul Shah, Xiaoyu Zhu, Xinke Deng, Zhongyu Jiang, Yang Yang, Joerg Liebelt, Arnab Mondal View a PDF of the paper titled On Robustness and Chain-of-Thought Consistency of RL-Finetuned VLMs, by Rosie Zhao and 7 other authors View PDF HTML (experimental) Abstract:Reinforcement learning (RL) fine-tuning has become a key technique for enhancing large language models (LLMs) on reasoning-intensive tasks, motivating its extension to vision language models (VLMs). While RL-tuned VLMs improve on visual reasoning benchmarks, they remain vulnerable to weak visual grounding, hallucinations, and over-reliance on textual cues. We show that simple, controlled textual perturbations--misleading captions or incorrect chain-of-thought (CoT) traces--cause substantial drops in robustness and confidence, and that these effects are more pronounced when CoT consistency is taken into account across open-source multimodal reasoning models. Entropy-based metrics further show that these perturbations reshape model uncertainty and probability mass on the correct option, exposing model-specific trends in miscalibration. To better understand these vulnerabilities, we further analyze RL fine-tuning dynamics and uncover an accuracy-faithfulness trade-off: fine-tuning raises benchmark accuracy, but can simultaneously erode the re...

Related Articles

Llms

AWS and Anthropic Advancing AI-powered Cybersecurity With Claude Mythos

AI News - General · 1 min ·
Gemini gets notebooks to help you organize projects | The Verge
Llms

Gemini gets notebooks to help you organize projects | The Verge

Google’s Gemini is getting a feature called “notebooks” to help you organize things about certain topics in a single place while using th...

The Verge - AI · 3 min ·
Anthropic Supply-Chain Risk Label Should Stay in Place, Appeals Court Says | WIRED
Llms

Anthropic Supply-Chain Risk Label Should Stay in Place, Appeals Court Says | WIRED

The AI company now faces conflicting rulings in its fight over how Claude can be used by the US military.

Wired - AI · 6 min ·
Tubi is the first streamer to launch a native app within ChatGPT | TechCrunch
Llms

Tubi is the first streamer to launch a native app within ChatGPT | TechCrunch

Tubi becomes the first streaming service to offer an app integration within ChatGPT, the AI chatbot that millions of users turn to for an...

TechCrunch - AI · 3 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime