[2602.18462] Assessing the Reliability of Persona-Conditioned LLMs as Synthetic Survey Respondents

[2602.18462] Assessing the Reliability of Persona-Conditioned LLMs as Synthetic Survey Respondents

arXiv - AI 3 min read Article

Summary

This article evaluates the reliability of persona-conditioned large language models (LLMs) as synthetic survey respondents, revealing that persona prompting may not enhance survey alignment and can distort results.

Why It Matters

Understanding the reliability of LLMs in survey contexts is crucial for researchers in computational social science. The findings challenge the effectiveness of persona conditioning, highlighting potential biases and inaccuracies that could mislead analyses and decision-making.

Key Takeaways

  • Persona prompting does not consistently improve LLM reliability in surveys.
  • In many cases, persona conditioning can degrade performance and introduce biases.
  • Demographic conditioning can redistribute errors, affecting subgroup fidelity.
  • Most survey items show minimal change, but some experience significant distortions.
  • The study emphasizes the need for careful evaluation of simulation practices in social science.

Computer Science > Computers and Society arXiv:2602.18462 (cs) [Submitted on 6 Feb 2026] Title:Assessing the Reliability of Persona-Conditioned LLMs as Synthetic Survey Respondents Authors:Erika Elizabeth Taday Morocho, Lorenzo Cima, Tiziano Fagni, Marco Avvenuti, Stefano Cresci View a PDF of the paper titled Assessing the Reliability of Persona-Conditioned LLMs as Synthetic Survey Respondents, by Erika Elizabeth Taday Morocho and 4 other authors View PDF HTML (experimental) Abstract:Using persona-conditioned LLMs as synthetic survey respondents has become a common practice in computational social science and agent-based simulations. Yet, it remains unclear whether multi-attribute persona prompting improves LLM reliability or instead introduces distortions. Here we contribute to this assessment by leveraging a large dataset of U.S. microdata from the World Values Survey. Concretely, we evaluate two open-weight chat models and a random-guesser baseline across more than 70K respondent-item instances. We find that persona prompting does not yield a clear aggregate improvement in survey alignment and, in many cases, significantly degrades performance. Persona effects are highly heterogeneous as most items exhibit minimal change, while a small subset of questions and underrepresented subgroups experience disproportionate distortions. Our findings highlight a key adverse impact of current persona-based simulation practices: demographic conditioning can redistribute error in ways...

Related Articles

Llms

[P] Remote sensing foundation models made easy to use.

This project enables the idea of tasking remote sensing models to acquire embeddings like we task satellites to acquire data! https://git...

Reddit - Machine Learning · 1 min ·
Llms

I stopped using Claude like a chatbot — 7 prompt shifts that reclaimed 10 hours of my week

submitted by /u/ThereWas [link] [comments]

Reddit - Artificial Intelligence · 1 min ·
Llms

What features do you actually want in an AI chatbot that nobody has built yet?

Hey everyone 👋 I'm building a new AI chat app and before I build anything I want to hear from real users first. Current AI tools like Cha...

Reddit - Artificial Intelligence · 1 min ·
Llms

So, what exactly is going on with the Claude usage limits?

I'm extremely new to AI and am building a local agent for fun. I purchased a Claude Pro account because it helped me a lot in the past wh...

Reddit - Artificial Intelligence · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime