[2603.01124] ClinCoT: Clinical-Aware Visual Chain-of-Thought for Medical Vision Language Models
About this article
Abstract page for arXiv paper 2603.01124: ClinCoT: Clinical-Aware Visual Chain-of-Thought for Medical Vision Language Models
Computer Science > Computer Vision and Pattern Recognition arXiv:2603.01124 (cs) [Submitted on 1 Mar 2026] Title:ClinCoT: Clinical-Aware Visual Chain-of-Thought for Medical Vision Language Models Authors:Xiwei Liu, Yulong Li, Xinlin Zhuang, Xuhui Li, Jianxu Chen, Haolin Yang, Imran Razzak, Yutong Xie View a PDF of the paper titled ClinCoT: Clinical-Aware Visual Chain-of-Thought for Medical Vision Language Models, by Xiwei Liu and 7 other authors View PDF HTML (experimental) Abstract:Medical Vision-Language Models have shown promising potential in clinical decision support, yet they remain prone to factual hallucinations due to insufficient grounding in localized pathological evidence. Existing medical alignment methods primarily operate at the response level through preference optimization, improving output correctness but leaving intermediate reasoning weakly connected to visual regions. Although chain-of-thought (CoT) enhances multimodal reasoning, it remains largely text-centric, limiting effective integration of clinical visual cues. To address this gap, we propose ClinCoT, a clinical-aware visual chain-of-thought framework that transforms preference optimization from response-level correction to visual-driven reasoning. We introduce an automatic data generation pipeline that constructs clinically grounded preference pairs through reasoning with hypotheses-driven region proposals. Multiple Med-LLMs evaluators rank and assign scores to each response, and these rankings ...