[2603.20662] Attention in Space: Functional Roles of VLM Heads for Spatial Reasoning
About this article
Abstract page for arXiv paper 2603.20662: Attention in Space: Functional Roles of VLM Heads for Spatial Reasoning
Computer Science > Artificial Intelligence arXiv:2603.20662 (cs) [Submitted on 21 Mar 2026] Title:Attention in Space: Functional Roles of VLM Heads for Spatial Reasoning Authors:Xueqi Ma, Shuo Yang, Yanbei Jiang, Shu Liu, Zhenzhen Liu, Jiayang Ao, Xingjun Ma, Sarah Monazam Erfani, James Bailey View a PDF of the paper titled Attention in Space: Functional Roles of VLM Heads for Spatial Reasoning, by Xueqi Ma and 8 other authors View PDF HTML (experimental) Abstract:Despite remarkable advances in large Vision-Language Models (VLMs), spatial reasoning remains a persistent challenge. In this work, we investigate how attention heads within VLMs contribute to spatial reasoning by analyzing their functional roles through a mechanistic interpretability lens. We introduce CogVSR, a dataset that decomposes complex spatial reasoning questions into step-by-step subquestions designed to simulate human-like reasoning via a chain-of-thought paradigm, with each subquestion linked to specific cognitive functions such as spatial perception or relational reasoning. Building on CogVSR, we develop a probing framework to identify and characterize attention heads specialized for these functions. Our analysis across diverse VLM families reveals that these functional heads are universally sparse, vary in number and distribution across functions. Notably, spatially specialized heads are fewer than those for other cognitive functions, highlighting their scarcity. We propose methods to activate laten...