[2510.21011] Generating the Modal Worker: A Cross-Model Audit of Race and Gender in LLM-Generated Personas Across 41 Occupations
About this article
Abstract page for arXiv paper 2510.21011: Generating the Modal Worker: A Cross-Model Audit of Race and Gender in LLM-Generated Personas Across 41 Occupations
Computer Science > Human-Computer Interaction arXiv:2510.21011 (cs) [Submitted on 23 Oct 2025 (v1), last revised 26 Mar 2026 (this version, v2)] Title:Generating the Modal Worker: A Cross-Model Audit of Race and Gender in LLM-Generated Personas Across 41 Occupations Authors:Ilona van der Linden, Sahana Kumar, Arnav Dixit, Aadi Sudan, Smruthi Danda, David C. Anastasiu, Kai Lukoff View a PDF of the paper titled Generating the Modal Worker: A Cross-Model Audit of Race and Gender in LLM-Generated Personas Across 41 Occupations, by Ilona van der Linden and 6 other authors View PDF HTML (experimental) Abstract:As generative AI tools are increasingly used to portray people in professional roles, understanding their racial and gender representational biases is critical. We audit over 1.5 million occupational personas generated by four major large language models - GPT-4, Gemini 2.5, DeepSeek V3.1, and Mistral-medium - across 41 U.S. occupations. Comparing these personas against U.S. Bureau of Labor Statistics (BLS) data, we find that models generate demographics with less variation than real-world data, functionally compressing each occupation toward a dominant demographic profile rather than representing population-level variation. A shift/exaggeration decomposition reveals the structure of these distortions: White (-31pp) and Black (-9pp) workers are consistently underrepresented, while Hispanic (+17pp) and Asian (+12pp) workers are overrepresented, with stereotype exaggeration ...