[2602.15270] Enhancing Diversity and Feasibility: Joint Population Synthesis from Multi-source Data Using Generative Models
Summary
This paper presents a novel method for generating synthetic populations using multi-source data and Wasserstein Generative Adversarial Networks (WGAN), addressing limitations in diversity and feasibility in agent-based models.
Why It Matters
The ability to generate realistic synthetic populations is crucial for urban planning and transportation modeling. This research enhances the accuracy of agent-based models by improving data diversity and feasibility, which can lead to better decision-making in urban development and policy.
Key Takeaways
- Proposes a joint learning approach using WGAN to synthesize multi-source datasets.
- Improves diversity and feasibility of synthetic data, crucial for agent-based models.
- Demonstrates a 7% increase in recall and 15% in precision over traditional methods.
- Introduces a regularization term that enhances the generator's performance.
- Achieves a higher overall similarity score compared to sequential methods.
Computer Science > Artificial Intelligence arXiv:2602.15270 (cs) [Submitted on 17 Feb 2026] Title:Enhancing Diversity and Feasibility: Joint Population Synthesis from Multi-source Data Using Generative Models Authors:Farbod Abbasi, Zachary Patterson, Bilal Farooq View a PDF of the paper titled Enhancing Diversity and Feasibility: Joint Population Synthesis from Multi-source Data Using Generative Models, by Farbod Abbasi and 2 other authors View PDF HTML (experimental) Abstract:Generating realistic synthetic populations is essential for agent-based models (ABM) in transportation and urban planning. Current methods face two major limitations. First, many rely on a single dataset or follow a sequential data fusion and generation process, which means they fail to capture the complex interplay between features. Second, these approaches struggle with sampling zeros (valid but unobserved attribute combinations) and structural zeros (infeasible combinations due to logical constraints), which reduce the diversity and feasibility of the generated data. This study proposes a novel method to simultaneously integrate and synthesize multi-source datasets using a Wasserstein Generative Adversarial Network (WGAN) with gradient penalty. This joint learning method improves both the diversity and feasibility of synthetic data by defining a regularization term (inverse gradient penalty) for the generator loss function. For the evaluation, we implement a unified evaluation metric for similarit...