[2509.25390] SpinBench: Perspective and Rotation as a Lens on Spatial Reasoning in VLMs
About this article
Abstract page for arXiv paper 2509.25390: SpinBench: Perspective and Rotation as a Lens on Spatial Reasoning in VLMs
Computer Science > Computer Vision and Pattern Recognition arXiv:2509.25390 (cs) [Submitted on 29 Sep 2025 (v1), last revised 28 Feb 2026 (this version, v2)] Title:SpinBench: Perspective and Rotation as a Lens on Spatial Reasoning in VLMs Authors:Yuyou Zhang, Radu Corcodel, Chiori Hori, Anoop Cherian, Ding Zhao View a PDF of the paper titled SpinBench: Perspective and Rotation as a Lens on Spatial Reasoning in VLMs, by Yuyou Zhang and 4 other authors View PDF Abstract:We present SpinBench, a cognitively grounded diagnostic benchmark for evaluating spatial reasoning in vision language models (VLMs). SpinBench is designed around the core challenge of spatial reasoning: perspective taking, the ability to reason about how scenes and object relations change under viewpoint transformation. Since perspective taking requires multiple cognitive capabilities, such as recognizing objects across views, relative positions grounding, and mentally simulating transformations, SpinBench introduces a set of fine-grained diagnostic categories. Our categories target translation, rotation, object relative pose, and viewpoint change, and are progressively structured so that single-object simpler tasks scaffold toward the most demanding multi-object perspective-taking setting. We evaluate 43 state-of-the-art VLMs, both proprietary and open source. Results reveal systematic weaknesses: strong egocentric bias, poor rotational understanding, and inconsistencies under symmetrical and syntactic refor...