[2602.13921] GREPO: A Benchmark for Graph Neural Networks on Repository-Level Bug Localization
Summary
The article presents GREPO, a benchmark for evaluating Graph Neural Networks (GNNs) in repository-level bug localization, addressing limitations of current methods and showcasing GNNs' potential.
Why It Matters
This research is significant as it fills a gap in the application of GNNs for bug localization, a critical area in software engineering. By providing a dedicated benchmark, GREPO enables more effective evaluation and advancement of GNN methodologies, potentially improving software maintenance and development processes.
Key Takeaways
- GREPO is the first benchmark specifically for GNNs in bug localization.
- It includes 86 Python repositories and 47,294 bug-fixing tasks.
- GNNs outperform traditional information retrieval methods for this task.
- The benchmark facilitates future research in GNN applications.
- Access to the code and data structures is provided for further exploration.
Computer Science > Machine Learning arXiv:2602.13921 (cs) [Submitted on 14 Feb 2026] Title:GREPO: A Benchmark for Graph Neural Networks on Repository-Level Bug Localization Authors:Juntong Wang, Libin Chen, Xiyuan Wang, Shijia Kang, Haotong Yang, Da Zheng, Muhan Zhang View a PDF of the paper titled GREPO: A Benchmark for Graph Neural Networks on Repository-Level Bug Localization, by Juntong Wang and 6 other authors View PDF HTML (experimental) Abstract:Repository-level bug localization-the task of identifying where code must be modified to fix a bug-is a critical software engineering challenge. Standard Large Language Modles (LLMs) are often unsuitable for this task due to context window limitations that prevent them from processing entire code repositories. As a result, various retrieval methods are commonly used, including keyword matching, text similarity, and simple graph-based heuristics such as Breadth-First Search. Graph Neural Networks (GNNs) offer a promising alternative due to their ability to model complex, repository-wide dependencies; however, their application has been hindered by the lack of a dedicated benchmark. To address this gap, we introduce GREPO, the first GNN benchmark for repository-scale bug localization tasks. GREPO comprises 86 Python repositories and 47294 bug-fixing tasks, providing graph-based data structures ready for direct GNN processing. Our evaluation of various GNN architectures shows outstanding performance compared to established info...