[2602.12317] Free Lunch in Medical Image Foundation Model Pre-training via Randomized Synthesis and Disentanglement
Summary
The paper presents RaSD, a framework for pre-training medical image foundation models using synthetic data, demonstrating superior performance across multiple tasks compared to traditional methods.
Why It Matters
This research addresses the challenges of limited annotated datasets in medical imaging by leveraging synthetic data for model training. It highlights a significant shift in AI methodologies, promoting scalable and privacy-preserving solutions that could enhance clinical applications and accessibility in healthcare.
Key Takeaways
- RaSD utilizes randomized synthesis and disentanglement for effective model training.
- Synthetic data can outperform traditional training methods in medical imaging tasks.
- The framework supports robust representation learning across various imaging modalities.
- RaSD demonstrates a scalable approach that can be applied to diverse clinical datasets.
- This research paves the way for privacy-preserving AI solutions in healthcare.
Quantitative Biology > Quantitative Methods arXiv:2602.12317 (q-bio) [Submitted on 12 Feb 2026] Title:Free Lunch in Medical Image Foundation Model Pre-training via Randomized Synthesis and Disentanglement Authors:Yuhan Wei, Yuting He, Linshan Wu, Fuxiang Huang, Junlin Hou, Hao Chen View a PDF of the paper titled Free Lunch in Medical Image Foundation Model Pre-training via Randomized Synthesis and Disentanglement, by Yuhan Wei and 5 other authors View PDF HTML (experimental) Abstract:Medical image foundation models (MIFMs) have demonstrated remarkable potential for a wide range of clinical tasks, yet their development is constrained by the scarcity, heterogeneity, and high cost of large-scale annotated datasets. Here, we propose RaSD (Randomized Synthesis and Disentanglement), a scalable framework for pre-training MIFMs entirely on synthetic data. By modeling anatomical structures and appearance variations with randomized Gaussian distributions, RaSD exposes models to sufficient multi-scale structural and appearance perturbations, forcing them to rely on invariant and task-relevant anatomical cues rather than dataset-specific textures, thereby enabling robust and transferable representation learning. We pre-trained RaSD on 1.2 million 3D volumes and 9.6 million 2D images, and extensively evaluated the resulting models across 6 imaging modalities, 48 datasets, and 56 downstream tasks. Across all evaluated downstream tasks, RaSD consistently outperforms training-from-scratch...