[2602.18462] Assessing the Reliability of Persona-Conditioned LLMs as Synthetic Survey Respondents
Summary
This article evaluates the reliability of persona-conditioned large language models (LLMs) as synthetic survey respondents, revealing that persona prompting may not enhance survey alignment and can distort results.
Why It Matters
Understanding the reliability of LLMs in survey contexts is crucial for researchers in computational social science. The findings challenge the effectiveness of persona conditioning, highlighting potential biases and inaccuracies that could mislead analyses and decision-making.
Key Takeaways
- Persona prompting does not consistently improve LLM reliability in surveys.
- In many cases, persona conditioning can degrade performance and introduce biases.
- Demographic conditioning can redistribute errors, affecting subgroup fidelity.
- Most survey items show minimal change, but some experience significant distortions.
- The study emphasizes the need for careful evaluation of simulation practices in social science.
Computer Science > Computers and Society arXiv:2602.18462 (cs) [Submitted on 6 Feb 2026] Title:Assessing the Reliability of Persona-Conditioned LLMs as Synthetic Survey Respondents Authors:Erika Elizabeth Taday Morocho, Lorenzo Cima, Tiziano Fagni, Marco Avvenuti, Stefano Cresci View a PDF of the paper titled Assessing the Reliability of Persona-Conditioned LLMs as Synthetic Survey Respondents, by Erika Elizabeth Taday Morocho and 4 other authors View PDF HTML (experimental) Abstract:Using persona-conditioned LLMs as synthetic survey respondents has become a common practice in computational social science and agent-based simulations. Yet, it remains unclear whether multi-attribute persona prompting improves LLM reliability or instead introduces distortions. Here we contribute to this assessment by leveraging a large dataset of U.S. microdata from the World Values Survey. Concretely, we evaluate two open-weight chat models and a random-guesser baseline across more than 70K respondent-item instances. We find that persona prompting does not yield a clear aggregate improvement in survey alignment and, in many cases, significantly degrades performance. Persona effects are highly heterogeneous as most items exhibit minimal change, while a small subset of questions and underrepresented subgroups experience disproportionate distortions. Our findings highlight a key adverse impact of current persona-based simulation practices: demographic conditioning can redistribute error in ways...