[2603.03367] Bridging the Reproducibility Divide: Open Source Software's Role in Standardizing Healthcare AI
About this article
Abstract page for arXiv paper 2603.03367: Bridging the Reproducibility Divide: Open Source Software's Role in Standardizing Healthcare AI
Computer Science > Computers and Society arXiv:2603.03367 (cs) [Submitted on 2 Mar 2026] Title:Bridging the Reproducibility Divide: Open Source Software's Role in Standardizing Healthcare AI Authors:John Wu, Zhenbang Wu, Jimeng Sun View a PDF of the paper titled Bridging the Reproducibility Divide: Open Source Software's Role in Standardizing Healthcare AI, by John Wu and 2 other authors View PDF Abstract:Our analysis of recent AI4H publications reveals that, despite a trend toward utilizing open datasets and sharing modeling code, 74% of AI4H papers still rely on private datasets or do not share their code. This is especially concerning in healthcare applications, where trust is essential. Furthermore, inconsistent and poorly documented data preprocessing pipelines result in variable model performance reports, even for identical tasks and datasets, making it challenging to evaluate the true effectiveness of AI models. Despite the challenges posed by the reproducibility crisis, addressing these issues through open practices offers substantial benefits. For instance, while the reproducibility mandate adds extra effort to research and publication, it significantly enhances the impact of the work. Our analysis shows that papers that used both public datasets and shared code received, on average, 110% more citations than those that do neither--more than doubling the citation count. Given the clear benefits of enhancing reproducibility, it is imperative for the AI4H community t...