[2504.08806] Endowing Embodied Agents with Spatial Reasoning Capabilities for Vision-and-Language Navigation
About this article
Abstract page for arXiv paper 2504.08806: Endowing Embodied Agents with Spatial Reasoning Capabilities for Vision-and-Language Navigation
Computer Science > Artificial Intelligence arXiv:2504.08806 (cs) [Submitted on 9 Apr 2025 (v1), last revised 2 Mar 2026 (this version, v2)] Title:Endowing Embodied Agents with Spatial Reasoning Capabilities for Vision-and-Language Navigation Authors:Qianqian Bai, Zhongpu Chen, Ling Luo, Huaming Du, Yuqian Lei, Ziyun Jiao View a PDF of the paper titled Endowing Embodied Agents with Spatial Reasoning Capabilities for Vision-and-Language Navigation, by Qianqian Bai and 5 other authors View PDF HTML (experimental) Abstract:Enhancing the spatial perception capabilities of mobile robots is crucial for achieving embodied Vision-and-Language Navigation (VLN). Although significant progress has been made in simulated environments, directly transferring these capabilities to real-world scenarios often results in severe hallucination phenomena, causing robots to lose effective spatial awareness. To address this issue, we propose BrainNav, a bio-inspired spatial cognitive navigation framework inspired by biological spatial cognition theories and cognitive map theory. BrainNav integrates dual-map (coordinate map and topological map) and dual-orientation (relative orientation and absolute orientation) strategies, enabling real-time navigation through dynamic scene capture and path planning. Its five core modules-Hippocampal Memory Hub, Visual Cortex Perception Engine, Parietal Spatial Constructor, Prefrontal Decision Center, and Cerebellar Motion Execution Unit-mimic biological cognitive...