[2603.06665] Better Eyes, Better Thoughts: Why Vision Chain-of-Thought Fails in Medicine

[2603.06665] Better Eyes, Better Thoughts: Why Vision Chain-of-Thought Fails in Medicine

arXiv - AI 3 min read

About this article

Abstract page for arXiv paper 2603.06665: Better Eyes, Better Thoughts: Why Vision Chain-of-Thought Fails in Medicine

Computer Science > Computer Vision and Pattern Recognition arXiv:2603.06665 (cs) [Submitted on 2 Mar 2026 (v1), last revised 10 Apr 2026 (this version, v2)] Title:Better Eyes, Better Thoughts: Why Vision Chain-of-Thought Fails in Medicine Authors:Yuan Wu, Zongxian Yang, Jiayu Qian, Songpan Gao, Guanxing Chen, Qiankun Li, Yu-An Huang, Zhi-An Huang View a PDF of the paper titled Better Eyes, Better Thoughts: Why Vision Chain-of-Thought Fails in Medicine, by Yuan Wu and 7 other authors View PDF HTML (experimental) Abstract:Large vision-language models (VLMs) often benefit from chain-of-thought (CoT) prompting in general domains, yet its efficacy in medical vision-language tasks remains underexplored. We report a counter-intuitive trend: on medical visual question answering, CoT frequently underperforms direct answering (DirA) across general-purpose and medical-specific models. We attribute this to a \emph{medical perception bottleneck}: subtle, domain-specific cues can weaken visual grounding, and CoT may compound early perceptual uncertainty rather than correct it. To probe this hypothesis, we introduce two training-free, inference-time grounding interventions: (i) \emph{perception anchoring} via region-of-interest cues and (ii) \emph{description grounding} via high-quality textual guidance. Across multiple benchmarks and model families, these interventions improve accuracy, mitigate CoT degradation, and in several settings reverse the CoT--DirA inversion. Our findings sugge...

Originally published on April 13, 2026. Curated by AI News.

Related Articles

From LLMs to hallucinations, here’s a simple guide to common AI terms
Llms

From LLMs to hallucinations, here’s a simple guide to common AI terms

TechCrunch - AI · 19 min ·
[2604.08457] CrashSight: A Phase-Aware, Infrastructure-Centric Video Benchmark for Traffic Crash Scene Understanding and Reasoning
Llms

[2604.08457] CrashSight: A Phase-Aware, Infrastructure-Centric Video Benchmark for Traffic Crash Scene Understanding and Reasoning

Abstract page for arXiv paper 2604.08457: CrashSight: A Phase-Aware, Infrastructure-Centric Video Benchmark for Traffic Crash Scene Under...

arXiv - AI · 4 min ·
[2604.08110] OV-Stitcher: A Global Context-Aware Framework for Training-Free Open-Vocabulary Semantic Segmentation
Llms

[2604.08110] OV-Stitcher: A Global Context-Aware Framework for Training-Free Open-Vocabulary Semantic Segmentation

Abstract page for arXiv paper 2604.08110: OV-Stitcher: A Global Context-Aware Framework for Training-Free Open-Vocabulary Semantic Segmen...

arXiv - AI · 3 min ·
[2602.04674] Overstating Attitudes, Ignoring Networks: LLM Biases in Simulating Misinformation Susceptibility
Llms

[2602.04674] Overstating Attitudes, Ignoring Networks: LLM Biases in Simulating Misinformation Susceptibility

Abstract page for arXiv paper 2602.04674: Overstating Attitudes, Ignoring Networks: LLM Biases in Simulating Misinformation Susceptibility

arXiv - AI · 4 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime