[2602.14879] CT-Bench: A Benchmark for Multimodal Lesion Understanding in Computed Tomography
Summary
CT-Bench introduces a benchmark dataset for multimodal lesion understanding in CT scans, featuring 20,335 lesions and a visual question answering component to enhance AI performance in radiology.
Why It Matters
This benchmark addresses the critical shortage of annotated CT datasets, facilitating advancements in AI for medical imaging. By providing comprehensive lesion data and evaluation metrics, CT-Bench aims to improve diagnostic accuracy and support clinical applications, ultimately enhancing patient care.
Key Takeaways
- CT-Bench includes 20,335 lesions from 7,795 CT studies, enhancing dataset availability for AI training.
- The dataset features a multitask visual question answering benchmark with 2,850 QA pairs for comprehensive evaluation.
- Fine-tuning models on CT-Bench significantly improves performance, demonstrating its clinical utility.
- The inclusion of hard negative examples reflects real-world diagnostic challenges, making the benchmark more applicable.
- CT-Bench serves as a valuable resource for researchers and practitioners in the field of medical imaging and AI.
Computer Science > Computer Vision and Pattern Recognition arXiv:2602.14879 (cs) [Submitted on 16 Feb 2026] Title:CT-Bench: A Benchmark for Multimodal Lesion Understanding in Computed Tomography Authors:Qingqing Zhu, Qiao Jin, Tejas S. Mathai, Yin Fang, Zhizheng Wang, Yifan Yang, Maame Sarfo-Gyamfi, Benjamin Hou, Ran Gu, Praveen T. S. Balamuralikrishna, Kenneth C. Wang, Ronald M. Summers, Zhiyong Lu View a PDF of the paper titled CT-Bench: A Benchmark for Multimodal Lesion Understanding in Computed Tomography, by Qingqing Zhu and 12 other authors View PDF HTML (experimental) Abstract:Artificial intelligence (AI) can automatically delineate lesions on computed tomography (CT) and generate radiology report content, yet progress is limited by the scarcity of publicly available CT datasets with lesion-level annotations. To bridge this gap, we introduce CT-Bench, a first-of-its-kind benchmark dataset comprising two components: a Lesion Image and Metadata Set containing 20,335 lesions from 7,795 CT studies with bounding boxes, descriptions, and size information, and a multitask visual question answering benchmark with 2,850 QA pairs covering lesion localization, description, size estimation, and attribute categorization. Hard negative examples are included to reflect real-world diagnostic challenges. We evaluate multiple state-of-the-art multimodal models, including vision-language and medical CLIP variants, by comparing their performance to radiologist assessments, demonstratin...