[2507.19575] Is Exchangeability better than I.I.D to handle Data Distribution Shifts while Pooling Data for Data-scarce Medical image segmentation?
Summary
This paper explores the effectiveness of using exchangeability over the traditional i.i.d. assumption in addressing data distribution shifts in medical image segmentation, particularly in data-scarce environments.
Why It Matters
Data scarcity is a critical issue in medical imaging, affecting the performance of deep learning models. This research provides insights into improving model robustness by proposing a new framework that could enhance segmentation accuracy, which is vital for clinical applications.
Key Takeaways
- Exchangeability offers a more practical approach than i.i.d. for data pooling in multi-source contexts.
- The proposed method improves feature representation in deep networks, addressing foreground-background discrepancies.
- The research demonstrates state-of-the-art segmentation performance on multiple medical imaging datasets.
Computer Science > Computer Vision and Pattern Recognition arXiv:2507.19575 (cs) [Submitted on 25 Jul 2025 (v1), last revised 23 Feb 2026 (this version, v2)] Title:Is Exchangeability better than I.I.D to handle Data Distribution Shifts while Pooling Data for Data-scarce Medical image segmentation? Authors:Ayush Roy, Samin Enam, Jun Xia, Won Hwa Kim, Vishnu Suresh Lokhande View a PDF of the paper titled Is Exchangeability better than I.I.D to handle Data Distribution Shifts while Pooling Data for Data-scarce Medical image segmentation?, by Ayush Roy and 4 other authors View PDF Abstract:Data scarcity is a major challenge in medical imaging, particularly for deep learning models. While data pooling (combining datasets from multiple sources) and data addition (adding more data from a new dataset) have been shown to enhance model performance, they are not without complications. Specifically, increasing the size of the training dataset through pooling or addition can induce distributional shifts, negatively affecting downstream model performance, a phenomenon known as the "Data Addition Dilemma". While the traditional i.i.d. assumption may not hold in multi-source contexts, assuming exchangeability across datasets provides a more practical framework for data pooling. In this work, we investigate medical image segmentation under these conditions, drawing insights from causal frameworks to propose a method for controlling foreground-background feature discrepancies across all lay...