[2604.09529] VL-Calibration: Decoupled Confidence Calibration for Large Vision-Language Models Reasoning
About this article
Abstract page for arXiv paper 2604.09529: VL-Calibration: Decoupled Confidence Calibration for Large Vision-Language Models Reasoning
Computer Science > Computer Vision and Pattern Recognition arXiv:2604.09529 (cs) [Submitted on 10 Apr 2026] Title:VL-Calibration: Decoupled Confidence Calibration for Large Vision-Language Models Reasoning Authors:Wenyi Xiao, Xinchi Xu, Leilei Gan View a PDF of the paper titled VL-Calibration: Decoupled Confidence Calibration for Large Vision-Language Models Reasoning, by Wenyi Xiao and 2 other authors View PDF HTML (experimental) Abstract:Large Vision Language Models (LVLMs) achieve strong multimodal reasoning but frequently exhibit hallucinations and incorrect responses with high certainty, which hinders their usage in high-stakes domains. Existing verbalized confidence calibration methods, largely developed for text-only LLMs, typically optimize a single holistic confidence score using binary answer-level correctness. This design is mismatched to LVLMs: an incorrect prediction may arise from perceptual failures or from reasoning errors given correct perception, and a single confidence conflates these sources while visual uncertainty is often dominated by language priors. To address these issues, we propose VL-Calibration, a reinforcement learning framework that explicitly decouples confidence into visual and reasoning confidence. To supervise visual confidence without ground-truth perception labels, we introduce an intrinsic visual certainty estimation that combines (i) visual grounding measured by KL-divergence under image perturbations and (ii) internal certainty meas...