[2602.10359] Beyond Calibration: Confounding Pathology Limits Foundation Model Specificity in Abdominal Trauma CT
Summary
This study evaluates the performance of foundation models in detecting abdominal trauma, revealing that specificity deficits are influenced by negative-class heterogeneity rather than prevalence alone.
Why It Matters
Understanding the limitations of foundation models in clinical settings is crucial for improving diagnostic accuracy in high-stakes environments like trauma care. This research highlights the need for adaptation and further training to enhance model specificity, which is vital for patient safety and effective treatment.
Key Takeaways
- Foundation models show comparable discrimination to task-specific models but lower specificity in detecting abdominal trauma.
- Specificity deficits in foundation models are primarily driven by the heterogeneity of the negative class.
- Training with labeled data can progressively reduce susceptibility to negative-class heterogeneity.
- High sensitivity in foundation models may not compensate for lower specificity in clinical applications.
- Adaptation of foundation models is necessary before their implementation in real-world clinical settings.
Electrical Engineering and Systems Science > Image and Video Processing arXiv:2602.10359 (eess) [Submitted on 10 Feb 2026 (v1), last revised 25 Feb 2026 (this version, v2)] Title:Beyond Calibration: Confounding Pathology Limits Foundation Model Specificity in Abdominal Trauma CT Authors:Jineel H Raythatha, Shuchang Ye, Jeremy Hsu, Jinman Kim View a PDF of the paper titled Beyond Calibration: Confounding Pathology Limits Foundation Model Specificity in Abdominal Trauma CT, by Jineel H Raythatha and 3 other authors View PDF HTML (experimental) Abstract:Purpose: Translating foundation models into clinical practice requires evaluating their performance under compound distribution shift, where severe class imbalance coexists with heterogeneous imaging appearances. This challenge is relevant for traumatic bowel injury, a rare but high-mortality diagnosis. We investigated whether specificity deficits in foundation models are associated with heterogeneity in the negative class. Methods: This retrospective study used the multi-institutional, RSNA Abdominal Traumatic Injury CT dataset (2019-2023), comprising scans from 23 centres. Two foundation models (MedCLIP, zero-shot; RadDINO, linear probe) were compared against three task-specific approaches (CNN, Transformer, Ensemble). Models were trained on 3,147 patients (2.3% bowel injury prevalence) and evaluated on an enriched 100-patient test set. To isolate negative-class effects, specificity was assessed in patients without bowel inj...