[2602.17402] A Contrastive Variational AutoEncoder for NSCLC Survival Prediction with Missing Modalities

arXiv - AI February 20, 2026 4 min read Article

Summary

This paper presents a Multimodal Contrastive Variational AutoEncoder (MCVAE) designed to improve survival prediction for non-small cell lung cancer (NSCLC) patients, particularly in cases with missing data modalities.

Why It Matters

Accurate survival prediction for NSCLC is critical for patient management and treatment planning. This research addresses the common issue of incomplete clinical data, proposing a robust model that enhances predictive accuracy even with missing modalities, which is highly relevant in real-world clinical settings.

Key Takeaways

The MCVAE model effectively integrates multiple data modalities for improved survival predictions.
Stochastic modality masking enhances the model's robustness against missing data.
The study demonstrates that multimodal integration does not always yield better results, emphasizing the need for careful model design.

Computer Science > Artificial Intelligence arXiv:2602.17402 (cs) [Submitted on 19 Feb 2026] Title:A Contrastive Variational AutoEncoder for NSCLC Survival Prediction with Missing Modalities Authors:Michele Zanitti, Vanja Miskovic, Francesco Trovò, Alessandra Laura Giulia Pedrocchi, Ming Shen, Yan Kyaw Tun, Arsela Prelaj, Sokol Kosta View a PDF of the paper titled A Contrastive Variational AutoEncoder for NSCLC Survival Prediction with Missing Modalities, by Michele Zanitti and 7 other authors View PDF Abstract:Predicting survival outcomes for non-small cell lung cancer (NSCLC) patients is challenging due to the different individual prognostic features. This task can benefit from the integration of whole-slide images, bulk transcriptomics, and DNA methylation, which offer complementary views of the patient's condition at diagnosis. However, real-world clinical datasets are often incomplete, with entire modalities missing for a significant fraction of patients. State-of-the-art models rely on available data to create patient-level representations or use generative models to infer missing modalities, but they lack robustness in cases of severe missingness. We propose a Multimodal Contrastive Variational AutoEncoder (MCVAE) to address this issue: modality-specific variational encoders capture the uncertainty in each data source, and a fusion bottleneck with learned gating mechanisms is introduced to normalize the contributions from present modalities. We propose a multi-task o...

Read Original Article