[2602.20066] HeatPrompt: Zero-Shot Vision-Language Modeling of Urban Heat Demand from Satellite Images
Summary
The paper presents HeatPrompt, a zero-shot vision-language framework for estimating urban heat demand from satellite images, enhancing energy planning in data-scarce regions.
Why It Matters
As cities strive to decarbonize heating systems, accurate heat demand mapping is essential. HeatPrompt leverages satellite imagery and machine learning to provide municipalities with critical data, addressing the lack of detailed building-level information and supporting climate action efforts.
Key Takeaways
- HeatPrompt uses satellite images to estimate urban heat demand.
- The framework achieves a 93.7% R^2 uplift and reduces MAE by 30%.
- It is particularly useful for municipalities lacking detailed building data.
- High-impact tokens identified align with high-demand zones.
- The approach supports energy planning in data-scarce areas.
Computer Science > Computer Vision and Pattern Recognition arXiv:2602.20066 (cs) [Submitted on 23 Feb 2026] Title:HeatPrompt: Zero-Shot Vision-Language Modeling of Urban Heat Demand from Satellite Images Authors:Kundan Thota, Xuanhao Mu, Thorsten Schlachter, Veit Hagenmeyer View a PDF of the paper titled HeatPrompt: Zero-Shot Vision-Language Modeling of Urban Heat Demand from Satellite Images, by Kundan Thota and 3 other authors View PDF HTML (experimental) Abstract:Accurate heat-demand maps play a crucial role in decarbonizing space heating, yet most municipalities lack detailed building-level data needed to calculate them. We introduce HeatPrompt, a zero-shot vision-language energy modeling framework that estimates annual heat demand using semantic features extracted from satellite images, basic Geographic Information System (GIS), and building-level features. We feed pretrained Large Vision Language Models (VLMs) with a domain-specific prompt to act as an energy planner and extract the visual attributes such as roof age, building density, etc, from the RGB satellite image that correspond to the thermal load. A Multi-Layer Perceptron (MLP) regressor trained on these captions shows an $R^2$ uplift of 93.7% and shrinks the mean absolute error (MAE) by 30% compared to the baseline model. Qualitative analysis shows that high-impact tokens align with high-demand zones, offering lightweight support for heat planning in data-scarce regions. Subjects: Computer Vision and Pattern...