[2512.01210] Knowledge Graph Augmented Large Language Models for Disease Prediction
About this article
Abstract page for arXiv paper 2512.01210: Knowledge Graph Augmented Large Language Models for Disease Prediction
Computer Science > Artificial Intelligence arXiv:2512.01210 (cs) [Submitted on 1 Dec 2025 (v1), last revised 2 Mar 2026 (this version, v3)] Title:Knowledge Graph Augmented Large Language Models for Disease Prediction Authors:Ruiyu Wang, Tuan Vinh, Ran Xu, Yuyin Zhou, Jiaying Lu, Carl Yang, Francisco Pasquel View a PDF of the paper titled Knowledge Graph Augmented Large Language Models for Disease Prediction, by Ruiyu Wang and 6 other authors View PDF HTML (experimental) Abstract:Electronic health records (EHRs) enable strong clinical prediction, but explanations are often coarse and hard to use for patient-level decisions. We propose a knowledge graph (KG)-guided chain-of-thought (CoT) framework for visit-level disease prediction on MIMIC-III. We map ICD-9 codes to PrimeKG, mine disease-relevant nodes and paths, and use these paths to scaffold temporally consistent CoT rationales, retaining only samples whose conclusions match observed outcomes. We fine-tune lightweight instruction-tuned LLMs (LLaMA-3.1-Instruct-8B and Gemma-7B) on two small cohorts (400 and 1,000 index visits) across ten PrimeKG-mapped diseases. Our models outperform strong classical baselines, reaching AUROC 0.66-0.70 and macro-AUPR 0.40-0.47. Without additional training, the models transfer zero-shot to the CRADLE cohort, improving accuracy from 0.40-0.51 to 0.72-0.77. In a blinded clinician study, KG-guided CoT rationales are consistently preferred for clarity, relevance, and correctness. Code is avail...