[2602.22752] Towards Simulating Social Media Users with LLMs: Evaluating the Operational Validity of Conditioned Comment Prediction
Summary
This article presents a study on the operational validity of using Large Language Models (LLMs) to simulate social media user behavior through Conditioned Comment Prediction (CCP).
Why It Matters
As LLMs become integral in social sciences, validating their ability to accurately simulate user behavior is crucial for understanding digital interactions. This research challenges existing paradigms and provides guidelines for improving the fidelity of these simulations, which can impact various applications in AI and social media analysis.
Key Takeaways
- Conditioned Comment Prediction (CCP) is introduced as a method for simulating social media interactions with LLMs.
- The study reveals a critical decoupling of form and content in low-resource settings, affecting semantic grounding.
- Explicit conditioning becomes unnecessary under fine-tuning, as models can infer behavior from histories.
- Findings challenge naive prompting methods and suggest prioritizing authentic behavioral data for simulations.
- The research emphasizes the importance of operational validity in using LLMs for social science applications.
Computer Science > Computation and Language arXiv:2602.22752 (cs) [Submitted on 26 Feb 2026] Title:Towards Simulating Social Media Users with LLMs: Evaluating the Operational Validity of Conditioned Comment Prediction Authors:Nils Schwager, Simon Münker, Alistair Plum, Achim Rettinger View a PDF of the paper titled Towards Simulating Social Media Users with LLMs: Evaluating the Operational Validity of Conditioned Comment Prediction, by Nils Schwager and 3 other authors View PDF HTML (experimental) Abstract:The transition of Large Language Models (LLMs) from exploratory tools to active "silicon subjects" in social science lacks extensive validation of operational validity. This study introduces Conditioned Comment Prediction (CCP), a task in which a model predicts how a user would comment on a given stimulus by comparing generated outputs with authentic digital traces. This framework enables a rigorous evaluation of current LLM capabilities with respect to the simulation of social media user behavior. We evaluated open-weight 8B models (Llama3.1, Qwen3, Ministral) in English, German, and Luxembourgish language scenarios. By systematically comparing prompting strategies (explicit vs. implicit) and the impact of Supervised Fine-Tuning (SFT), we identify a critical form vs. content decoupling in low-resource settings: while SFT aligns the surface structure of the text output (length and syntax), it degrades semantic grounding. Furthermore, we demonstrate that explicit conditio...