[2510.08713] Towards Unified World Models for Visual Navigation via Memory-Augmented Planning and Foresight
About this article
Abstract page for arXiv paper 2510.08713: Towards Unified World Models for Visual Navigation via Memory-Augmented Planning and Foresight
Computer Science > Artificial Intelligence arXiv:2510.08713 (cs) [Submitted on 9 Oct 2025 (v1), last revised 22 Mar 2026 (this version, v2)] Title:Towards Unified World Models for Visual Navigation via Memory-Augmented Planning and Foresight Authors:Yifei Dong, Fengyi Wu, Guangyu Chen, Lingdong Kong, Xu Zhu, Qiyu Hu, Yuxuan Zhou, Jingdong Sun, Jun-Yan He, Qi Dai, Alexander G. Hauptmann, Zhi-Qi Cheng View a PDF of the paper titled Towards Unified World Models for Visual Navigation via Memory-Augmented Planning and Foresight, by Yifei Dong and 11 other authors View PDF HTML (experimental) Abstract:Enabling embodied agents to imagine future states is essential for robust and generalizable visual navigation. Yet, state-of-the-art systems typically rely on modular designs that decouple navigation planning from visual world modeling, which often induces state-action misalignment and weak adaptability in novel or dynamic scenarios. We propose UniWM, a unified, memory-augmented world model that integrates egocentric visual foresight and planning within a single multimodal autoregressive backbone. UniWM explicitly grounds action selection in visually imagined outcomes, tightly aligning prediction with control. Meanwhile, a hierarchical memory mechanism fuses short-term perceptual cues with longer-term trajectory context, supporting stable and coherent reasoning over extended horizons. Extensive experiments on four challenging benchmarks (Go Stanford, ReCon, SCAND, HuRoN) and the 1X...