[2604.02543] Overconfidence and Calibration in Medical VQA: Empirical Findings and Hallucination-Aware Mitigation
About this article
Abstract page for arXiv paper 2604.02543: Overconfidence and Calibration in Medical VQA: Empirical Findings and Hallucination-Aware Mitigation
Computer Science > Computer Vision and Pattern Recognition arXiv:2604.02543 (cs) [Submitted on 2 Apr 2026] Title:Overconfidence and Calibration in Medical VQA: Empirical Findings and Hallucination-Aware Mitigation Authors:Ji Young Byun, Young-Jin Park, Jean-Philippe Corbeil, Asma Ben Abacha View a PDF of the paper titled Overconfidence and Calibration in Medical VQA: Empirical Findings and Hallucination-Aware Mitigation, by Ji Young Byun and 3 other authors View PDF HTML (experimental) Abstract:As vision-language models (VLMs) are increasingly deployed in clinical decision support, more than accuracy is required: knowing when to trust their predictions is equally critical. Yet, a comprehensive and systematic investigation into the overconfidence of these models remains notably scarce in the medical domain. We address this gap through a comprehensive empirical study of confidence calibration in VLMs, spanning three model families (Qwen3-VL, InternVL3, LLaVA-NeXT), three model scales (2B--38B), and multiple confidence estimation prompting strategies, across three medical visual question answering (VQA) benchmarks. Our study yields three key findings: First, overconfidence persists across model families and is not resolved by scaling or prompting, such as chain-of-thought and verbalized confidence variants. Second, simple post-hoc calibration approaches, such as Platt scaling, reduce calibration error and consistently outperform the prompt-based strategy. Third, due to their ...