Machine Learning Data Science Computer Vision

[2505.06646] Reproducing and Improving CheXNet: Deep Learning for Chest X-ray Disease Classification

arXiv - Machine Learning February 25, 2026 3 min read Article

Summary

This article discusses the reproduction and enhancement of CheXNet, a deep learning model for classifying chest X-ray diseases, using the NIH ChestX-ray14 dataset.

Why It Matters

As deep learning becomes integral to medical imaging, this research highlights advancements in disease classification accuracy, which can improve diagnostic processes and patient outcomes in healthcare. Understanding and improving models like CheXNet is crucial for the future of automated medical diagnostics.

Key Takeaways

The CheXNet model was reproduced and improved upon using the NIH ChestX-ray14 dataset.
The best-performing model achieved an average AUC-ROC score of 0.85 and an F1 score of 0.39.
Evaluation metrics such as F1 score and AUC-ROC are essential for assessing multi-label classification tasks in medical imaging.
The study emphasizes the importance of deep learning in enhancing diagnostic accuracy for chest diseases.
Continued research in this area is vital for integrating AI into standard medical practices.

Electrical Engineering and Systems Science > Image and Video Processing arXiv:2505.06646 (eess) [Submitted on 10 May 2025 (v1), last revised 24 Feb 2026 (this version, v3)] Title:Reproducing and Improving CheXNet: Deep Learning for Chest X-ray Disease Classification Authors:Daniel J. Strick, Carlos Garcia, Anthony Huang, Thomas Gardos View a PDF of the paper titled Reproducing and Improving CheXNet: Deep Learning for Chest X-ray Disease Classification, by Daniel J. Strick and 3 other authors View PDF HTML (experimental) Abstract:Deep learning for radiologic image analysis is a rapidly growing field in biomedical research and is likely to become a standard practice in modern medicine. On the publicly available NIH ChestX-ray14 dataset, containing X-ray images that are classified by the presence or absence of 14 different diseases, we reproduced an algorithm known as CheXNet, as well as explored other algorithms that outperform CheXNet's baseline metrics. Model performance was primarily evaluated using the F1 score and AUC-ROC, both of which are critical metrics for imbalanced, multi-label classification tasks in medical imaging. The best model achieved an average AUC-ROC score of 0.85 and an average F1 score of 0.39 across all 14 disease classifications present in the dataset. Comments: Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG) Cite as: arXiv:2505.06646 [eess.IV] (or arXiv:2505.06646v3 [eess.I...

Read Original Article