[2601.16529] SycoEval-EM: Sycophancy Evaluation of Large Language Models in Simulated Clinical Encounters for Emergency Care
About this article
Abstract page for arXiv paper 2601.16529: SycoEval-EM: Sycophancy Evaluation of Large Language Models in Simulated Clinical Encounters for Emergency Care
Computer Science > Artificial Intelligence arXiv:2601.16529 (cs) [Submitted on 23 Jan 2026 (v1), last revised 4 Mar 2026 (this version, v2)] Title:SycoEval-EM: Sycophancy Evaluation of Large Language Models in Simulated Clinical Encounters for Emergency Care Authors:Dongshen Peng, Yi Wang, Austin Schoeffler, Carl Preiksaitis, Christian Rose View a PDF of the paper titled SycoEval-EM: Sycophancy Evaluation of Large Language Models in Simulated Clinical Encounters for Emergency Care, by Dongshen Peng and 3 other authors View PDF HTML (experimental) Abstract:Large language models (LLMs) show promise in clinical decision support yet risk acquiescing to patient pressure for inappropriate care. We introduce SycoEval-EM, a multi-agent simulation framework evaluating LLM robustness through adversarial patient persuasion in emergency medicine. Across 20 LLMs and 1,875 encounters spanning three Choosing Wisely scenarios, acquiescence rates ranged from 0-100\%. Models showed higher vulnerability to imaging requests (38.8\%) than opioid prescriptions (25.0\%), with model capability poorly predicting robustness. All persuasion tactics proved equally effective (30.0-36.0\%), indicating general susceptibility rather than tactic-specific weakness. Our findings demonstrate that static benchmarks inadequately predict safety under social pressure, necessitating multi-turn adversarial testing for clinical AI certification. Comments: Subjects: Artificial Intelligence (cs.AI); Human-Computer In...