[2511.21471] SpatialBench: Benchmarking Multimodal Large Language Models for Spatial Cognition
About this article
Abstract page for arXiv paper 2511.21471: SpatialBench: Benchmarking Multimodal Large Language Models for Spatial Cognition
Computer Science > Artificial Intelligence arXiv:2511.21471 (cs) [Submitted on 26 Nov 2025 (v1), last revised 4 Mar 2026 (this version, v2)] Title:SpatialBench: Benchmarking Multimodal Large Language Models for Spatial Cognition Authors:Peiran Xu, Sudong Wang, Yao Zhu, Jianing Li, Gege Qi, Yunjian Zhang View a PDF of the paper titled SpatialBench: Benchmarking Multimodal Large Language Models for Spatial Cognition, by Peiran Xu and 5 other authors View PDF HTML (experimental) Abstract:Spatial cognition is fundamental to real-world multimodal intelligence, allowing models to effectively interact with the physical environment. While multimodal large language models (MLLMs) have made significant strides, existing benchmarks often oversimplify spatial cognition, reducing it to a single-dimensional metric, which fails to capture the hierarchical structure and interdependence of spatial abilities. To address this gap, we propose a hierarchical spatial cognition framework that decomposes spatial intelligence into five progressively complex levels from basic observation to high-level planning. Building upon this taxonomy, we construct SpatialBench, a large-scale, fine-grained benchmark covering 15 tasks aligned with these cognitive levels. To provide a unified evaluation across heterogeneous tasks, we further introduce a high-level capability-oriented metric that reliably assesses a model's overall spatial reasoning ability. Extensive experiments over massive MLLMs reveal distinct...