[2603.20225] The Arrival of AGI? When Expert Personas Exceed Expert Benchmarks
About this article
Abstract page for arXiv paper 2603.20225: The Arrival of AGI? When Expert Personas Exceed Expert Benchmarks
Computer Science > Computers and Society arXiv:2603.20225 (cs) [Submitted on 4 Mar 2026] Title:The Arrival of AGI? When Expert Personas Exceed Expert Benchmarks Authors:Drake Mullens, Stella Shen View a PDF of the paper titled The Arrival of AGI? When Expert Personas Exceed Expert Benchmarks, by Drake Mullens and Stella Shen View PDF Abstract:Do expert personas improve language model performance? The Wharton Generative AI Lab reports that they do not, broadcasting to millions via social media the recommendation that practitioners abandon a technique recommended by Anthropic, Google, and OpenAI. We demonstrate that this null finding was structurally predictable. Five core mechanisms precluded detection before data collection began: baseline contamination elevating the starting point to near-ceiling, system prompt hierarchy subordinating experimental manipulation, impossible expert specifications collapsing to generic competence, format constraints suppressing reasoning processes, and provider exclusion limiting generalizability. Controlled trials correcting these limitations reveal what the original design obscured. To test this, we selected the GPQA Diamond hardest questions to prevent baseline pattern matching, forcing reliance on genuine expert reasoning. On items with valid key answers, expert personas achieve ceiling accuracy. They eliminated all baseline errors through confidence amplification. Furthermore, forensic examination of model divergence identified that half...