[2603.24562] Scaling Recurrence-aware Foundation Models for Clinical Records via Next-Visit Prediction
About this article
Abstract page for arXiv paper 2603.24562: Scaling Recurrence-aware Foundation Models for Clinical Records via Next-Visit Prediction
Computer Science > Machine Learning arXiv:2603.24562 (cs) [Submitted on 25 Mar 2026] Title:Scaling Recurrence-aware Foundation Models for Clinical Records via Next-Visit Prediction Authors:Haresh Rengaraj Rajamohan, Xiang Gao, Weicheng Zhu, Shih-Lun Huang, Long Chen, Gabe Schulman, Huizhen Jin, Shengduo Li, Yixuan Wang, Huidi Yang, Kyunghyun Cho, Cem M. Deniz, Narges Razavian View a PDF of the paper titled Scaling Recurrence-aware Foundation Models for Clinical Records via Next-Visit Prediction, by Haresh Rengaraj Rajamohan and 12 other authors View PDF HTML (experimental) Abstract:While large-scale pretraining has revolutionized language modeling, its potential remains underexplored in healthcare with structured electronic health records (EHRs). We present RAVEN, a novel generative pretraining strategy for sequential EHR data based on Recurrence-Aware next-Visit EveNt prediction. Leveraging a dataset of over one million unique individuals, our model learns to autoregressively generate tokenized clinical events for the next visit conditioned on patient history. We introduce regularization on predicting repeated events and highlight a key pitfall in EHR-based foundation model evaluations: repeated event tokens can inflate performance metrics when new onsets are not distinguished from subsequent occurrences. Furthermore, we empirically investigate the scaling behaviors in a data-constrained, compute-saturated regime, showing that simply increasing model size is suboptimal wi...