[2505.22976] Toward Memory-Aided World Models: Benchmarking via Spatial Consistency
About this article
Abstract page for arXiv paper 2505.22976: Toward Memory-Aided World Models: Benchmarking via Spatial Consistency
Computer Science > Computer Vision and Pattern Recognition arXiv:2505.22976 (cs) [Submitted on 29 May 2025 (v1), last revised 8 Apr 2026 (this version, v2)] Title:Toward Memory-Aided World Models: Benchmarking via Spatial Consistency Authors:Kewei Lian, Shaofei Cai, Yilun Du, Yitao Liang View a PDF of the paper titled Toward Memory-Aided World Models: Benchmarking via Spatial Consistency, by Kewei Lian and 3 other authors View PDF HTML (experimental) Abstract:The ability to simulate the world in a spatially consistent manner is a crucial requirements for effective world models. Such a model enables high-quality visual generation, and also ensures the reliability of world models for downstream tasks such as simulation and planning. Designing a memory module is a crucial component for addressing spatial consistency: such a model must not only retain long-horizon observational information, but also enables the construction of explicit or implicit internal spatial representations. However, there are no dataset designed to promote the development of memory modules by explicitly enforcing spatial consistency constraints. Furthermore, most existing benchmarks primarily emphasize visual coherence or generation quality, neglecting the requirement of long-range spatial consistency. To bridge this gap, we construct a dataset and corresponding benchmark by sampling 150 distinct locations within the open-world environment of Minecraft, collecting about 250 hours (20 million frames) of ...