[2505.17779] U2-BENCH: Benchmarking Large Vision-Language Models on Ultrasound Understanding
Summary
The paper introduces U2-BENCH, a benchmark for evaluating large vision-language models (LVLMs) on ultrasound understanding, addressing challenges in medical imaging interpretation.
Why It Matters
U2-BENCH is significant as it provides a structured framework to assess the performance of LVLMs in ultrasound imaging, an area critical for healthcare. By identifying strengths and weaknesses in current models, it aims to enhance the application of AI in medical diagnostics, ultimately improving patient outcomes.
Key Takeaways
- U2-BENCH evaluates LVLMs across 8 clinically relevant tasks in ultrasound imaging.
- The benchmark includes 7,241 cases covering 15 anatomical regions.
- Results show strong performance in image classification but challenges in spatial reasoning and language generation.
- The framework aims to accelerate research in multimodal medical imaging.
- U2-BENCH is the first comprehensive benchmark for LVLMs in ultrasound understanding.
Computer Science > Computer Vision and Pattern Recognition arXiv:2505.17779 (cs) [Submitted on 23 May 2025 (v1), last revised 23 Feb 2026 (this version, v4)] Title:U2-BENCH: Benchmarking Large Vision-Language Models on Ultrasound Understanding Authors:Anjie Le, Henan Liu, Yue Wang, Zhenyu Liu, Rongkun Zhu, Taohan Weng, Jinze Yu, Boyang Wang, Yalun Wu, Kaiwen Yan, Quanlin Sun, Meirui Jiang, Jialun Pei, Siya Liu, Haoyun Zheng, Zhoujun Li, Alison Noble, Jacques Souquet, Xiaoqing Guo, Manxi Lin, Hongcheng Guo View a PDF of the paper titled U2-BENCH: Benchmarking Large Vision-Language Models on Ultrasound Understanding, by Anjie Le and 20 other authors View PDF Abstract:Ultrasound is a widely-used imaging modality critical to global healthcare, yet its interpretation remains challenging due to its varying image quality on operators, noises, and anatomical structures. Although large vision-language models (LVLMs) have demonstrated impressive multimodal capabilities across natural and medical domains, their performance on ultrasound remains largely unexplored. We introduce U2-BENCH, the first comprehensive benchmark to evaluate LVLMs on ultrasound understanding across classification, detection, regression, and text generation tasks. U2-BENCH aggregates 7,241 cases spanning 15 anatomical regions and defines 8 clinically inspired tasks, such as diagnosis, view recognition, lesion localization, clinical value estimation, and report generation, across 50 ultrasound application scenar...