[2509.09794] Synthetic Homes: A Multimodal Generative AI Pipeline for Residential Building Data Generation under Data Scarcity
About this article
Abstract page for arXiv paper 2509.09794: Synthetic Homes: A Multimodal Generative AI Pipeline for Residential Building Data Generation under Data Scarcity
Computer Science > Artificial Intelligence arXiv:2509.09794 (cs) [Submitted on 11 Sep 2025 (v1), last revised 8 Apr 2026 (this version, v4)] Title:Synthetic Homes: A Multimodal Generative AI Pipeline for Residential Building Data Generation under Data Scarcity Authors:Jackson Eshbaugh, Chetan Tiwari, Jorge Silveyra View a PDF of the paper titled Synthetic Homes: A Multimodal Generative AI Pipeline for Residential Building Data Generation under Data Scarcity, by Jackson Eshbaugh and 2 other authors View PDF HTML (experimental) Abstract:Computational models have emerged as powerful tools for multi-scale energy modeling research at the building and urban scale, supporting data-driven analysis across building and urban energy systems. However, these models require large amounts of building parameter data that is often inaccessible, expensive to collect, or subject to privacy constraints. We introduce a modular, multimodal generative Artificial Intelligence (AI) framework that integrates image, tabular, and simulation-based components and produces synthetic residential building datasets from publicly available county records and images, and present an end-to-end pipeline instantiating this framework. To reduce typical Large Language Model (LLM) challenges, we evaluate our model's components using occlusion-based visual focus analysis. Our analysis demonstrates that our selected vision-language model achieves significantly stronger visual focus than a GPT-based alternative for b...