[2512.17396] RadImageNet-VQA: A Large-Scale CT and MRI Dataset for Radiologic Visual Question Answering
About this article
Abstract page for arXiv paper 2512.17396: RadImageNet-VQA: A Large-Scale CT and MRI Dataset for Radiologic Visual Question Answering
Computer Science > Computer Vision and Pattern Recognition arXiv:2512.17396 (cs) [Submitted on 19 Dec 2025 (v1), last revised 30 Mar 2026 (this version, v2)] Title:RadImageNet-VQA: A Large-Scale CT and MRI Dataset for Radiologic Visual Question Answering Authors:Léo Butsanets, Charles Corbière, Julien Khlaut, Pierre Manceron, Corentin Dancette View a PDF of the paper titled RadImageNet-VQA: A Large-Scale CT and MRI Dataset for Radiologic Visual Question Answering, by L\'eo Butsanets and 4 other authors View PDF HTML (experimental) Abstract:In this work, we introduce RadImageNet-VQA, a large-scale dataset designed to advance radiologic visual question answering (VQA) on CT and MRI exams. Existing medical VQA datasets are limited in scale, dominated by X-ray imaging or biomedical illustrations, and often prone to text-based shortcuts. RadImageNet-VQA is built from expert-curated annotations and provides 750K images paired with 7.5M question-answer samples. It covers three key tasks - abnormality detection, anatomy recognition, and pathology identification - spanning eight anatomical regions and 97 pathology categories, and supports open-ended, closed-ended, and multiple-choice questions. Extensive experiments show that state-of-the-art vision-language models still struggle with fine-grained pathology identification, particularly in open-ended settings and even after fine-tuning. Text-only analysis further reveals that model performance collapses to near-random without image ...