[2602.22469] Beyond Dominant Patches: Spatial Credit Redistribution For Grounded Vision-Language Models
Summary
This paper introduces Spatial Credit Redistribution (SCR) to address hallucinations in vision-language models by redistributing activation credit from dominant patches to contextual areas, enhancing model accuracy.
Why It Matters
The research tackles a significant issue in vision-language models where hallucinations can lead to inaccurate outputs. By proposing SCR, the authors provide a practical solution that improves model performance in real-time applications, making it relevant for developers and researchers in AI and computer vision.
Key Takeaways
- SCR reduces hallucination rates in vision-language models by redistributing activation credit.
- The method is training-free and can be applied during inference, making it practical for real-time use.
- SCR shows significant improvements over existing methods, with lower overhead and better performance metrics.
Computer Science > Computer Vision and Pattern Recognition arXiv:2602.22469 (cs) [Submitted on 25 Feb 2026] Title:Beyond Dominant Patches: Spatial Credit Redistribution For Grounded Vision-Language Models Authors:Niamul Hassan Samin, Md Arifur Rahman, Abdullah Ibne Hanif, Juena Ahmed Noshin, Md Ashikur Rahman View a PDF of the paper titled Beyond Dominant Patches: Spatial Credit Redistribution For Grounded Vision-Language Models, by Niamul Hassan Samin and 4 other authors View PDF HTML (experimental) Abstract:Vision-language models (VLMs) frequently hallucinate objects absent from the input image. We trace this failure to spatial credit collapse: activation credit concentrating on sparse visual patches in early transformer layers, which suppresses contextual evidence and increases reliance on language priors. We introduce Spatial Credit Redistribution (SCR), a training-free inference-time intervention that redistributes hidden-state activation from high-attention source patches to their context, guided by low-entropy inputs. We evaluate six model families (Chameleon, LLaVA, and Qwen, including both Qwen-VL and Qwen2-VL) at scales of 7B, 13B, and 30B, on POPE and CHAIR benchmarks. SCR reduces hallucination by ~4.7-6.0 percentage points on POPE-Adversarial, cuts CHAIR-s by 3.7-5.2 percentage points (42-51 percent relative), and CHAIR-i by 2.7-4.4 percentage points (44-58 percent relative), and preserves CIDEr within 0.8 percentage points. Gains are largest for low-entropy in...