[2604.04239] Good Rankings, Wrong Probabilities: A Calibration Audit of Multimodal Cancer Survival Models
About this article
Abstract page for arXiv paper 2604.04239: Good Rankings, Wrong Probabilities: A Calibration Audit of Multimodal Cancer Survival Models
Computer Science > Machine Learning arXiv:2604.04239 (cs) [Submitted on 5 Apr 2026] Title:Good Rankings, Wrong Probabilities: A Calibration Audit of Multimodal Cancer Survival Models Authors:Sajad Ghawami View a PDF of the paper titled Good Rankings, Wrong Probabilities: A Calibration Audit of Multimodal Cancer Survival Models, by Sajad Ghawami View PDF HTML (experimental) Abstract:Multimodal deep learning models that fuse whole-slide histopathology images with genomic data have achieved strong discriminative performance for cancer survival prediction, as measured by the concordance index. Yet whether the survival probabilities derived from these models - either directly from native outputs or via standard post-hoc reconstruction - are calibrated remains largely unexamined. We conduct, to our knowledge, the first systematic fold-level 1-calibration audit of multimodal WSI-genomics survival architectures, evaluating native discrete-time survival outputs (Experiment A: 3 models on TCGA-BRCA) and Breslow-reconstructed survival curves from scalar risk scores (Experiment B: 11 architectures across 5 TCGA cancer types). In Experiment A, all three models fail 1-calibration on a majority of folds (12 of 15 fold-level tests reject after Benjamini-Hochberg correction). Across the full 290 fold-level tests, 166 reject the null of correct calibration at the median event time after Benjamini-Hochberg correction (FDR = 0.05). MCAT achieves C-index 0.817 on GBMLGG yet fails 1-calibration...