[2602.14498] Uncertainty-Aware Vision-Language Segmentation for Medical Imaging
Summary
This paper presents a novel uncertainty-aware multimodal segmentation framework that integrates radiological images and clinical text to enhance medical diagnosis accuracy.
Why It Matters
The integration of uncertainty modeling in medical imaging can significantly improve diagnostic reliability, especially in challenging clinical scenarios. This research addresses the critical need for advanced segmentation techniques that can handle ambiguity, potentially leading to better patient outcomes.
Key Takeaways
- Introduces a framework that combines radiological images and clinical text for improved medical segmentation.
- Proposes a new loss function, Spectral-Entropic Uncertainty (SEU), to enhance model learning under uncertainty.
- Demonstrates superior performance and efficiency compared to existing state-of-the-art methods on various medical datasets.
- Highlights the importance of uncertainty modeling in enhancing the reliability of medical imaging tasks.
- Emphasizes the need for structured modality alignment in vision-language applications.
Computer Science > Computer Vision and Pattern Recognition arXiv:2602.14498 (cs) COVID-19 e-print Important: e-prints posted on arXiv are not peer-reviewed by arXiv; they should not be relied upon without context to guide clinical practice or health-related behavior and should not be reported in news media as established information without consulting multiple experts in the field. [Submitted on 16 Feb 2026] Title:Uncertainty-Aware Vision-Language Segmentation for Medical Imaging Authors:Aryan Das, Tanishq Rachamalla, Koushik Biswas, Swalpa Kumar Roy, Vinay Kumar Verma View a PDF of the paper titled Uncertainty-Aware Vision-Language Segmentation for Medical Imaging, by Aryan Das and 4 other authors View PDF HTML (experimental) Abstract:We introduce a novel uncertainty-aware multimodal segmentation framework that leverages both radiological images and associated clinical text for precise medical diagnosis. We propose a Modality Decoding Attention Block (MoDAB) with a lightweight State Space Mixer (SSMix) to enable efficient cross-modal fusion and long-range dependency modelling. To guide learning under ambiguity, we propose the Spectral-Entropic Uncertainty (SEU) Loss, which jointly captures spatial overlap, spectral consistency, and predictive uncertainty in a unified objective. In complex clinical circumstances with poor image quality, this formulation improves model reliability. Extensive experiments on various publicly available medical datasets, QATA-COVID19, MosMed++,...