[2604.00493] A Reasoning-Enabled Vision-Language Foundation Model for Chest X-ray Interpretation
About this article
Abstract page for arXiv paper 2604.00493: A Reasoning-Enabled Vision-Language Foundation Model for Chest X-ray Interpretation
Computer Science > Computer Vision and Pattern Recognition arXiv:2604.00493 (cs) [Submitted on 1 Apr 2026] Title:A Reasoning-Enabled Vision-Language Foundation Model for Chest X-ray Interpretation Authors:Yabin Zhang, Chong Wang, Yunhe Gao, Jiaming Liu, Maya Varma, Justin Xu, Sophie Ostmeier, Jin Long, Sergios Gatidis, Seena Dehkharghani, Arne Michalson, Eun Kyoung Hong, Christian Bluethgen, Haiwei Henry Guo, Alexander Victor Ortiz, Stephan Altmayer, Sandhya Bodapati, Joseph David Janizek, Ken Chang, Jean-Benoit Delbrouck, Akshay S. Chaudhari, Curtis P. Langlotz View a PDF of the paper titled A Reasoning-Enabled Vision-Language Foundation Model for Chest X-ray Interpretation, by Yabin Zhang and 21 other authors View PDF HTML (experimental) Abstract:Chest X-rays (CXRs) are among the most frequently performed imaging examinations worldwide, yet rising imaging volumes increase radiologist workload and the risk of diagnostic errors. Although artificial intelligence (AI) systems have shown promise for CXR interpretation, most generate only final predictions, without making explicit how visual evidence is translated into radiographic findings and diagnostic predictions. We present CheXOne, a reasoning-enabled vision-language model for CXR interpretation. CheXOne jointly generates diagnostic predictions and explicit, clinically grounded reasoning traces that connect visual evidence, radiographic findings, and these predictions. The model is trained on 14.7 million instruction and...