[2604.03635] A Generative Foundation Model for Multimodal Histopathology
About this article
Abstract page for arXiv paper 2604.03635: A Generative Foundation Model for Multimodal Histopathology
Computer Science > Computer Vision and Pattern Recognition arXiv:2604.03635 (cs) [Submitted on 4 Apr 2026] Title:A Generative Foundation Model for Multimodal Histopathology Authors:Jinxi Xiang, Mingjie Li, Siyu Hou, Yijiang Chen, Xiangde Luo, Yuanfeng Ji, Xiang Zhou, Ehsan Adeli, Akshay Chaudhari, Curtis P. Langlotz, Kilian M. Pohl, Ruijiang Li View a PDF of the paper titled A Generative Foundation Model for Multimodal Histopathology, by Jinxi Xiang and 11 other authors View PDF HTML (experimental) Abstract:Accurate diagnosis and treatment of complex diseases require integrating histological, molecular, and clinical data, yet in practice these modalities are often incomplete owing to tissue scarcity, assay cost, and workflow constraints. Existing computational approaches attempt to impute missing modalities from available data but rely on task-specific models trained on narrow, single source-target pairs, limiting their generalizability. Here we introduce MuPD (Multimodal Pathology Diffusion), a generative foundation model that embeds hematoxylin and eosin (H&E)-stained histology, molecular RNA profiles, and clinical text into a shared latent space through a diffusion transformer with decoupled cross-modal attention. Pretrained on 100 million histology image patches, 1.6 million text-histology pairs, and 10.8 million RNA-histology pairs spanning 34 human organs, MuPD supports diverse cross-modal synthesis tasks with minimal or no task-specific fine-tuning. For text-conditi...