[2604.06663] Restoring Heterogeneity in LLM-based Social Simulation: An Audience Segmentation Approach
About this article
Abstract page for arXiv paper 2604.06663: Restoring Heterogeneity in LLM-based Social Simulation: An Audience Segmentation Approach
Computer Science > Computers and Society arXiv:2604.06663 (cs) [Submitted on 8 Apr 2026] Title:Restoring Heterogeneity in LLM-based Social Simulation: An Audience Segmentation Approach Authors:Xiaoyou Qin, Zhihong Li, Xiaoxiao Cheng View a PDF of the paper titled Restoring Heterogeneity in LLM-based Social Simulation: An Audience Segmentation Approach, by Xiaoyou Qin and 2 other authors View PDF HTML (experimental) Abstract:Large Language Models (LLMs) are increasingly used to simulate social attitudes and behaviors, offering scalable "silicon samples" that can approximate human data. However, current simulation practice often collapses diversity into an "average persona," masking subgroup variation that is central to social reality. This study introduces audience segmentation as a systematic approach for restoring heterogeneity in LLM-based social simulation. Using U.S. climate-opinion survey data, we compare six segmentation configurations across two open-weight LLMs (Llama 3.1-70B and Mixtral 8x22B), varying segmentation identifier granularity, parsimony, and selection logic (theory-driven, data-driven, and instrument-based). We evaluate simulation performance with a three-dimensional evaluation framework covering distributional, structural, and predictive fidelity. Results show that increasing identifier granularity does not produce consistent improvement: moderate enrichment can improve performance, but further expansion does not reliably help and can worsen structura...