[2604.00027] Multi-lingual Multi-institutional Electronic Health Record based Predictive Model
About this article
Abstract page for arXiv paper 2604.00027: Multi-lingual Multi-institutional Electronic Health Record based Predictive Model
Computer Science > Computation and Language arXiv:2604.00027 (cs) [Submitted on 11 Mar 2026] Title:Multi-lingual Multi-institutional Electronic Health Record based Predictive Model Authors:Kyunghoon Hur, Heeyoung Kwak, Jinsu Jang, Nakhwan Kim, Edward Choi View a PDF of the paper titled Multi-lingual Multi-institutional Electronic Health Record based Predictive Model, by Kyunghoon Hur and 4 other authors View PDF HTML (experimental) Abstract:Large-scale EHR prediction across institutions is hindered by substantial heterogeneity in schemas and code systems. Although Common Data Models (CDMs) can standardize records for multi-institutional learning, the manual harmonization and vocabulary mapping are costly and difficult to scale. Text-based harmonization provides an alternative by converting raw EHR into a unified textual form, enabling pooled learning without explicit standardization. However, applying this paradigm to multi-national datasets introduces an additional layer of heterogeneity, which is "language" that must be addressed for truly scalable EHRs learning. In this work, we investigate multilingual multi-institutional learning for EHR prediction, aiming to enable pooled training across multinational ICU datasets without manual standardization. We compare two practical strategies for handling language barriers: (i) directly modeling multilingual records with multilingual encoders, and (ii) translating non-English records into English via LLM-based word-level transla...