[2502.01969] Mitigating Object Hallucinations in Large Vision-Language Models via Attention Calibration
About this article
Abstract page for arXiv paper 2502.01969: Mitigating Object Hallucinations in Large Vision-Language Models via Attention Calibration
Computer Science > Computer Vision and Pattern Recognition arXiv:2502.01969 (cs) [Submitted on 4 Feb 2025 (v1), last revised 24 Mar 2026 (this version, v2)] Title:Mitigating Object Hallucinations in Large Vision-Language Models via Attention Calibration Authors:Younan Zhu, Linwei Tao, Minjing Dong, Chang Xu View a PDF of the paper titled Mitigating Object Hallucinations in Large Vision-Language Models via Attention Calibration, by Younan Zhu and 3 other authors View PDF HTML (experimental) Abstract:Large Vision-Language Models (LVLMs) exhibit impressive multimodal reasoning capabilities but remain highly susceptible to object hallucination, where models generate responses that are not factually aligned with the visual content. Recent works attribute this issue to an inherent bias of LVLMs where the vision token attention map has spurious focus on certain positions, and propose to mitigate this issue by reordering visual tokens. However, we find that different LVLMs exhibit different correlations between attention and spatial position, which makes existing static solutions difficult to generalize to other LVLMs. To begin with, we investigate the attention bias introduced by image tokens through a toy experiment, in which a blank image is fed into the model to capture its position-dependent bias. We then remove this bias from the original attention map, which already leads to a substantial reduction in hallucinations. This proof of concept validates the core intuition behind...