[2601.16529] SycoEval-EM: Sycophancy Evaluation of Large Language

[2601.16529] SycoEval-EM: Sycophancy Evaluation of Large Language Models in Simulated Clinical Encounters for Emergency Care

arXiv - AI March 05, 2026 3 min read

About this article

Abstract page for arXiv paper 2601.16529: SycoEval-EM: Sycophancy Evaluation of Large Language Models in Simulated Clinical Encounters for Emergency Care

Computer Science > Artificial Intelligence arXiv:2601.16529 (cs) [Submitted on 23 Jan 2026 (v1), last revised 4 Mar 2026 (this version, v2)] Title:SycoEval-EM: Sycophancy Evaluation of Large Language Models in Simulated Clinical Encounters for Emergency Care Authors:Dongshen Peng, Yi Wang, Austin Schoeffler, Carl Preiksaitis, Christian Rose View a PDF of the paper titled SycoEval-EM: Sycophancy Evaluation of Large Language Models in Simulated Clinical Encounters for Emergency Care, by Dongshen Peng and 3 other authors View PDF HTML (experimental) Abstract:Large language models (LLMs) show promise in clinical decision support yet risk acquiescing to patient pressure for inappropriate care. We introduce SycoEval-EM, a multi-agent simulation framework evaluating LLM robustness through adversarial patient persuasion in emergency medicine. Across 20 LLMs and 1,875 encounters spanning three Choosing Wisely scenarios, acquiescence rates ranged from 0-100\%. Models showed higher vulnerability to imaging requests (38.8\%) than opioid prescriptions (25.0\%), with model capability poorly predicting robustness. All persuasion tactics proved equally effective (30.0-36.0\%), indicating general susceptibility rather than tactic-specific weakness. Our findings demonstrate that static benchmarks inadequately predict safety under social pressure, necessitating multi-turn adversarial testing for clinical AI certification. Comments: Subjects: Artificial Intelligence (cs.AI); Human-Computer In...

Originally published on March 05, 2026. Curated by AI News.

Llms

HALO - Hierarchical Autonomous Learning Organism

The idea is called HALO - Hierarchical Autonomous Learning Organism. The core premise is simple: what if instead of just making LLMs bigg...

Reddit - Artificial Intelligence · 1 min · 7 minutes ago

Llms

Bluesky’s new app is an AI for customizing your feed | The Verge

Eventually Attie will be able to vibe code entire apps for the AT Protocol.

The Verge - AI · 3 min · about 4 hours ago

Llms

Nicolas Carlini (67.2k citations on Google Scholar) says Claude is a better security researcher than him, made $3.7 million from exploiting smart contracts, and found vulnerabilities in Linux and Ghost

Link: https://m.youtube.com/watch?v=1sd26pWhfmg The Linux exploit is especially interesting because it was introduced in 2003 and was nev...

Reddit - Artificial Intelligence · 1 min · about 7 hours ago

Llms

[P] I built an autonomous ML agent that runs experiments on tabular data indefinitely - inspired by Karpathy's AutoResearch

Inspired by Andrej Karpathy's AutoResearch, I built a system where Claude Code acts as an autonomous ML researcher on tabular binary clas...

Reddit - Machine Learning · 1 min · about 7 hours ago

[2601.16529] SycoEval-EM: Sycophancy Evaluation of Large Language Models in Simulated Clinical Encounters for Emergency Care

About this article

Related Articles

HALO - Hierarchical Autonomous Learning Organism

Bluesky’s new app is an AI for customizing your feed | The Verge

Nicolas Carlini (67.2k citations on Google Scholar) says Claude is a better security researcher than him, made $3.7 million from exploiting smart contracts, and found vulnerabilities in Linux and Ghost

[P] I built an autonomous ML agent that runs experiments on tabular data indefinitely - inspired by Karpathy's AutoResearch

No comments

Stay updated with AI News