[2511.22294] Structure is Supervision: Multiview Masked Autoencoders for Radiology
About this article
Abstract page for arXiv paper 2511.22294: Structure is Supervision: Multiview Masked Autoencoders for Radiology
Computer Science > Computer Vision and Pattern Recognition arXiv:2511.22294 (cs) [Submitted on 27 Nov 2025 (v1), last revised 2 Apr 2026 (this version, v4)] Title:Structure is Supervision: Multiview Masked Autoencoders for Radiology Authors:Sonia Laguna, Andrea Agostini, Alain Ryser, Samuel Ruiperez-Campillo, Irene Cannistraci, Moritz Vandenhirtz, Stephan Mandt, Nicolas Deperrois, Farhad Nooralahzadeh, Michael Krauthammer, Thomas M. Sutter, Julia E. Vogt View a PDF of the paper titled Structure is Supervision: Multiview Masked Autoencoders for Radiology, by Sonia Laguna and 11 other authors View PDF HTML (experimental) Abstract:Building robust medical machine learning systems requires pretraining strategies that exploit the intrinsic structure present in clinical data. We introduce Multiview Masked Autoencoder (MVMAE), a self-supervised framework that leverages the natural multi-view organization of radiology studies to learn view-invariant and disease-relevant representations. MVMAE combines masked image reconstruction with cross-view alignment, transforming clinical redundancy across projections into a powerful self-supervisory signal. We further extend this approach with MVMAE-V2T, which incorporates radiology reports as an auxiliary text-based learning signal to enhance semantic grounding while preserving fully vision-based inference. Evaluated on a downstream disease classification task on three large-scale public datasets, MIMIC-CXR, CheXpert, and PadChest, MVMAE con...