[2602.20677] UrbanFM: Scaling Urban Spatio-Temporal Foundation Models
Summary
The paper presents UrbanFM, a novel framework for scaling urban spatio-temporal foundation models, addressing challenges in generalizability and data integration across diverse urban systems.
Why It Matters
Urban computing is crucial for understanding human mobility and city evolution, yet existing models are often limited to specific scenarios. UrbanFM aims to unify and enhance the modeling of urban data, potentially transforming urban planning and smart city initiatives by enabling better predictive analytics and decision-making.
Key Takeaways
- UrbanFM addresses the fragmentation in urban computing by scaling spatio-temporal models.
- The framework introduces WorldST, a billion-scale dataset standardizing urban data from over 100 cities.
- UrbanFM employs a minimalist architecture that learns dynamic dependencies autonomously.
- EvalST is established as the largest urban spatio-temporal benchmark to date.
- The model demonstrates zero-shot generalization across various unseen urban scenarios.
Computer Science > Machine Learning arXiv:2602.20677 (cs) [Submitted on 24 Feb 2026] Title:UrbanFM: Scaling Urban Spatio-Temporal Foundation Models Authors:Wei Chen, Yuqian Wu, Junle Chen, Xiaofang Zhou, Yuxuan Liang View a PDF of the paper titled UrbanFM: Scaling Urban Spatio-Temporal Foundation Models, by Wei Chen and 4 other authors View PDF Abstract:Urban systems, as dynamic complex systems, continuously generate spatio-temporal data streams that encode the fundamental laws of human mobility and city evolution. While AI for Science has witnessed the transformative power of foundation models in disciplines like genomics and meteorology, urban computing remains fragmented due to "scenario-specific" models, which are overfitted to specific regions or tasks, hindering their generalizability. To bridge this gap and advance spatio-temporal foundation models for urban systems, we adopt scaling as the central perspective and systematically investigate two key questions: what to scale and how to scale. Grounded in first-principles analysis, we identify three critical dimensions: heterogeneity, correlation, and dynamics, aligning these principles with the fundamental scientific properties of urban spatio-temporal data. Specifically, to address heterogeneity through data scaling, we construct WorldST. This billion-scale corpus standardizes diverse physical signals, such as traffic flow and speed, from over 100 global cities into a unified data format. To enable computation scalin...