[2510.15018] UrbanVerse: Scaling Urban Simulation by Watching City-Tour Videos
About this article
Abstract page for arXiv paper 2510.15018: UrbanVerse: Scaling Urban Simulation by Watching City-Tour Videos
Computer Science > Computer Vision and Pattern Recognition arXiv:2510.15018 (cs) [Submitted on 16 Oct 2025 (v1), last revised 2 Mar 2026 (this version, v2)] Title:UrbanVerse: Scaling Urban Simulation by Watching City-Tour Videos Authors:Mingxuan Liu, Honglin He, Elisa Ricci, Wayne Wu, Bolei Zhou View a PDF of the paper titled UrbanVerse: Scaling Urban Simulation by Watching City-Tour Videos, by Mingxuan Liu and 4 other authors View PDF Abstract:Urban embodied AI agents, ranging from delivery robots to quadrupeds, are increasingly populating our cities, navigating chaotic streets to provide last-mile connectivity. Training such agents requires diverse, high-fidelity urban environments to scale, yet existing human-crafted or procedurally generated simulation scenes either lack scalability or fail to capture real-world complexity. We introduce UrbanVerse, a data-driven real-to-sim system that converts crowd-sourced city-tour videos into physics-aware, interactive simulation scenes. UrbanVerse consists of: (i) UrbanVerse-100K, a repository of 100k+ annotated urban 3D assets with semantic and physical attributes, and (ii) UrbanVerse-Gen, an automatic pipeline that extracts scene layouts from video and instantiates metric-scale 3D simulations using retrieved assets. Running in IsaacSim, UrbanVerse offers 160 high-quality constructed scenes from 24 countries, along with a curated benchmark of 10 artist-designed test scenes. Experiments show that UrbanVerse scenes preserve real-wo...