[2505.07372] Self-Bootstrapping Automated Program Repair: Using LLMs

[2505.07372] Self-Bootstrapping Automated Program Repair: Using LLMs to Generate and Evaluate Synthetic Training Data for Bug Repair

arXiv - AI March 31, 2026 4 min read

About this article

Abstract page for arXiv paper 2505.07372: Self-Bootstrapping Automated Program Repair: Using LLMs to Generate and Evaluate Synthetic Training Data for Bug Repair

Computer Science > Software Engineering arXiv:2505.07372 (cs) [Submitted on 12 May 2025 (v1), last revised 29 Mar 2026 (this version, v2)] Title:Self-Bootstrapping Automated Program Repair: Using LLMs to Generate and Evaluate Synthetic Training Data for Bug Repair Authors:David de-Fitero-Dominguez, Antonio Garcia-Cabot, Eva Garcia-Lopez View a PDF of the paper titled Self-Bootstrapping Automated Program Repair: Using LLMs to Generate and Evaluate Synthetic Training Data for Bug Repair, by David de-Fitero-Dominguez and 1 other authors View PDF Abstract:This paper presents a novel methodology for enhancing Automated Program Repair (APR) through synthetic data generation utilizing Large Language Models (LLMs). Current APR systems are constrained by the limited availability of high-quality training data encompassing diverse bug types across multiple programming languages. The proposed approach addresses this limitation through a two-phase process: a synthetic sample generation followed by a rigorous quality assessment. Multiple state-of-the-art LLMs were employed to generate approximately 30,000 paired examples of buggy and fixed code across 12 programming languages and 13 bug categories. Subsequently, these samples underwent cross-model evaluation against five criteria: correctness, code quality, security, performance, and completeness. Experimental evaluation on the VulRepair test set dataset showed statistically significant improvements in Perfect Prediction rates, with the...

Originally published on March 31, 2026. Curated by AI News.

Llms

Depth-first pruning seems to transfer from GPT-2 to Llama (unexpectedly well)

TL;DR: Removing the right transformer layers (instead of shrinking all layers) gives smaller, faster models with minimal quality loss — a...

Reddit - Artificial Intelligence · 1 min · 39 minutes ago

Llms

[2603.23966] Policy-Guided Threat Hunting: An LLM enabled Framework with Splunk SOC Triage

Abstract page for arXiv paper 2603.23966: Policy-Guided Threat Hunting: An LLM enabled Framework with Splunk SOC Triage

arXiv - AI · 4 min · about 2 hours ago

Llms

[2603.16790] InCoder-32B: Code Foundation Model for Industrial Scenarios

Abstract page for arXiv paper 2603.16790: InCoder-32B: Code Foundation Model for Industrial Scenarios

arXiv - AI · 4 min · about 2 hours ago

Llms

[2603.16430] EngGPT2: Sovereign, Efficient and Open Intelligence

Abstract page for arXiv paper 2603.16430: EngGPT2: Sovereign, Efficient and Open Intelligence

arXiv - AI · 4 min · about 2 hours ago

[2505.07372] Self-Bootstrapping Automated Program Repair: Using LLMs to Generate and Evaluate Synthetic Training Data for Bug Repair

About this article

Related Articles

Depth-first pruning seems to transfer from GPT-2 to Llama (unexpectedly well)

[2603.23966] Policy-Guided Threat Hunting: An LLM enabled Framework with Splunk SOC Triage

[2603.16790] InCoder-32B: Code Foundation Model for Industrial Scenarios

[2603.16430] EngGPT2: Sovereign, Efficient and Open Intelligence

No comments

Stay updated with AI News