[2602.22469] Beyond Dominant Patches: Spatial Credit Redistribution For Grounded Vision-Language Models

[2602.22469] Beyond Dominant Patches: Spatial Credit Redistribution For Grounded Vision-Language Models

arXiv - AI 4 min read Article

Summary

This paper introduces Spatial Credit Redistribution (SCR) to address hallucinations in vision-language models by redistributing activation credit from dominant patches to contextual areas, enhancing model accuracy.

Why It Matters

The research tackles a significant issue in vision-language models where hallucinations can lead to inaccurate outputs. By proposing SCR, the authors provide a practical solution that improves model performance in real-time applications, making it relevant for developers and researchers in AI and computer vision.

Key Takeaways

  • SCR reduces hallucination rates in vision-language models by redistributing activation credit.
  • The method is training-free and can be applied during inference, making it practical for real-time use.
  • SCR shows significant improvements over existing methods, with lower overhead and better performance metrics.

Computer Science > Computer Vision and Pattern Recognition arXiv:2602.22469 (cs) [Submitted on 25 Feb 2026] Title:Beyond Dominant Patches: Spatial Credit Redistribution For Grounded Vision-Language Models Authors:Niamul Hassan Samin, Md Arifur Rahman, Abdullah Ibne Hanif, Juena Ahmed Noshin, Md Ashikur Rahman View a PDF of the paper titled Beyond Dominant Patches: Spatial Credit Redistribution For Grounded Vision-Language Models, by Niamul Hassan Samin and 4 other authors View PDF HTML (experimental) Abstract:Vision-language models (VLMs) frequently hallucinate objects absent from the input image. We trace this failure to spatial credit collapse: activation credit concentrating on sparse visual patches in early transformer layers, which suppresses contextual evidence and increases reliance on language priors. We introduce Spatial Credit Redistribution (SCR), a training-free inference-time intervention that redistributes hidden-state activation from high-attention source patches to their context, guided by low-entropy inputs. We evaluate six model families (Chameleon, LLaVA, and Qwen, including both Qwen-VL and Qwen2-VL) at scales of 7B, 13B, and 30B, on POPE and CHAIR benchmarks. SCR reduces hallucination by ~4.7-6.0 percentage points on POPE-Adversarial, cuts CHAIR-s by 3.7-5.2 percentage points (42-51 percent relative), and CHAIR-i by 2.7-4.4 percentage points (44-58 percent relative), and preserves CIDEr within 0.8 percentage points. Gains are largest for low-entropy in...

Related Articles

Llms

We hit 150 stars on our AI setup tool!

yo folks, we just hit 150 stars on our open source tool that auto makes AI context files. got 90 PRs merged and 20 issues that ppl are pi...

Reddit - Artificial Intelligence · 1 min ·
Llms

Is ai getting dummer?

Over the past month, it feels like GPT and Gemini have been giving wrong answers a lot. Do you feel the same, or am I exaggerating? submi...

Reddit - Artificial Intelligence · 1 min ·
Llms

If AI is really making us more productive... why does it feel like we are working more, not less...?

The promise of AI was the ultimate system optimisation: Efficiency. On paper, the tools are delivering something similar to what they pro...

Reddit - Artificial Intelligence · 1 min ·
Llms

[R] GPT-5.4-mini regressed 22pp on vanilla prompting vs GPT-5-mini. Nobody noticed because benchmarks don't test this. Recursive Language Models solved it.

GPT-5.4-mini produces shorter, terser outputs by default. Vanilla accuracy dropped from 69.5% to 47.2% across 12 tasks (1,800 evals). The...

Reddit - Machine Learning · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime