[2602.21648] Multimodal Survival Modeling and Fairness-Aware Clinical Machine Learning for 5-Year Breast Cancer Risk Prediction
Summary
This article presents a multimodal machine learning framework for predicting 5-year breast cancer survival, integrating clinical and genomic data to improve model calibration and fairness.
Why It Matters
Accurate breast cancer risk prediction is crucial for effective treatment planning. This study addresses common pitfalls in clinical risk models, such as poor calibration and subgroup disparities, by introducing a robust framework that emphasizes fairness and reproducibility, potentially improving patient outcomes.
Key Takeaways
- Introduces a multimodal framework for breast cancer survival prediction.
- Compares CoxNet and XGBoost models for performance and fairness.
- Achieves high accuracy with AUCs of 98.3 and 98.6 for CoxNet and XGBoost, respectively.
- Emphasizes the importance of fairness auditing across diverse patient demographics.
- Provides a reproducible approach that can be applied to other clinical datasets.
Computer Science > Machine Learning arXiv:2602.21648 (cs) [Submitted on 25 Feb 2026] Title:Multimodal Survival Modeling and Fairness-Aware Clinical Machine Learning for 5-Year Breast Cancer Risk Prediction Authors:Toktam Khatibi View a PDF of the paper titled Multimodal Survival Modeling and Fairness-Aware Clinical Machine Learning for 5-Year Breast Cancer Risk Prediction, by Toktam Khatibi View PDF Abstract:Clinical risk prediction models often underperform in real-world settings due to poor calibration, limited transportability, and subgroup disparities. These challenges are amplified in high-dimensional multimodal cancer datasets characterized by complex feature interactions and a p >> n structure. We present a fully reproducible multimodal machine learning framework for 5-year overall survival prediction in breast cancer, integrating clinical variables with high-dimensional transcriptomic and copy-number alteration (CNA) features from the METABRIC cohort. After variance- and sparsity-based filtering and dimensionality reduction, models were trained using stratified train/validation/test splits with validation-based hyperparameter tuning. Two survival approaches were compared: an elastic-net regularized Cox model (CoxNet) and a gradient-boosted survival tree model implemented using XGBoost. CoxNet provides embedded feature selection and stable estimation, whereas XGBoost captures nonlinear effects and higher-order interactions. Performance was assessed using time-depend...