[2510.16688] Pursuing Minimal Sufficiency in Spatial Reasoning
About this article
Abstract page for arXiv paper 2510.16688: Pursuing Minimal Sufficiency in Spatial Reasoning
Computer Science > Computer Vision and Pattern Recognition arXiv:2510.16688 (cs) [Submitted on 19 Oct 2025 (v1), last revised 5 Mar 2026 (this version, v2)] Title:Pursuing Minimal Sufficiency in Spatial Reasoning Authors:Yejie Guo, Yunzhong Hou, Wufei Ma, Meng Tang, Ming-Hsuan Yang View a PDF of the paper titled Pursuing Minimal Sufficiency in Spatial Reasoning, by Yejie Guo and 4 other authors View PDF HTML (experimental) Abstract:Spatial reasoning, the ability to ground language in 3D understanding, remains a persistent challenge for Vision-Language Models (VLMs). We identify two fundamental bottlenecks: inadequate 3D understanding capabilities stemming from 2D-centric pre-training, and reasoning failures induced by redundant 3D information. To address these, we first construct a Minimal Sufficient Set (MSS) of information before answering a given question: a compact selection of 3D perception results from \textit{expert models}. We introduce MSSR (Minimal Sufficient Spatial Reasoner), a dual-agent framework that implements this principle. A Perception Agent programmatically queries 3D scenes using a versatile perception toolbox to extract sufficient information, including a novel SOG (Situated Orientation Grounding) module that robustly extracts language-grounded directions. A Reasoning Agent then iteratively refines this information to pursue minimality, pruning redundant details and requesting missing ones in a closed loop until the MSS is curated. Extensive experimen...