[2504.19467] BRIDGE: Benchmarking Large Language Models for Understanding Real-world Clinical Practice Text
About this article
Abstract page for arXiv paper 2504.19467: BRIDGE: Benchmarking Large Language Models for Understanding Real-world Clinical Practice Text
Computer Science > Computation and Language arXiv:2504.19467 (cs) [Submitted on 28 Apr 2025 (v1), last revised 29 Mar 2026 (this version, v4)] Title:BRIDGE: Benchmarking Large Language Models for Understanding Real-world Clinical Practice Text Authors:Jiageng Wu, Bowen Gu, Ren Zhou, Kevin Xie, Doug Snyder, Yixing Jiang, Valentina Carducci, Richard Wyss, Rishi J Desai, Emily Alsentzer, Leo Anthony Celi, Adam Rodman, Sebastian Schneeweiss, Jonathan H. Chen, Santiago Romero-Brufau, Kueiyu Joshua Lin, Jie Yang View a PDF of the paper titled BRIDGE: Benchmarking Large Language Models for Understanding Real-world Clinical Practice Text, by Jiageng Wu and 16 other authors View PDF Abstract:Large language models (LLMs) hold great promise for medical applications and are evolving rapidly, with new models being released at an accelerated pace. However, benchmarking on large-scale real-world data such as electronic health records (EHRs) is critical, as clinical decisions are directly informed by these sources, yet current evaluations remain limited. Most existing benchmarks rely on medical exam-style questions or PubMed-derived text, failing to capture the complexity of real-world clinical data. Others focus narrowly on specific application scenarios, limiting their generalizability across broader clinical use. To address this gap, we present BRIDGE, a comprehensive multilingual benchmark comprising 87 tasks sourced from real-world clinical data sources across nine languages. It cove...