[2511.16858] Investigating Test Overfitting on SWE-bench
About this article
Abstract page for arXiv paper 2511.16858: Investigating Test Overfitting on SWE-bench
Computer Science > Software Engineering arXiv:2511.16858 (cs) [Submitted on 20 Nov 2025 (v1), last revised 3 Apr 2026 (this version, v3)] Title:Investigating Test Overfitting on SWE-bench Authors:Toufique Ahmed, Jatin Ganhotra, Avraham Shinnar, Martin Hirzel View a PDF of the paper titled Investigating Test Overfitting on SWE-bench, by Toufique Ahmed and 3 other authors View PDF HTML (experimental) Abstract:Tests can be useful towards resolving issues on code repositories. However, relying too much on tests for issue resolution can lead to code that technically passes observed tests but actually misses important cases or even breaks functionality. This problem, called test overfitting, is exacerbated by the fact that issues usually lack readily executable tests. Instead, several issue resolution systems use tests auto-generated from issues, which may be imperfect. Some systems even iteratively refine code and tests jointly. This paper presents the first empirical study of test overfitting in this setting. Subjects: Software Engineering (cs.SE); Machine Learning (cs.LG) Cite as: arXiv:2511.16858 [cs.SE] (or arXiv:2511.16858v3 [cs.SE] for this version) https://doi.org/10.48550/arXiv.2511.16858 Focus to learn more arXiv-issued DOI via DataCite Submission history From: Toufique Ahmed Dr. [view email] [v1] Thu, 20 Nov 2025 23:55:56 UTC (432 KB) [v2] Tue, 27 Jan 2026 16:12:38 UTC (356 KB) [v3] Fri, 3 Apr 2026 16:15:46 UTC (337 KB) Full-text links: Access Paper: View a PDF of...