[2602.22124] SWE-Protégé: Learning to Selectively Collaborate With an Expert Unlocks Small Language Models as Software Engineering Agents
Summary
The paper presents SWE-Protégé, a framework that enhances small language models (SLMs) for software engineering tasks by enabling selective collaboration with expert models, improving performance significantly.
Why It Matters
This research addresses the limitations of small language models in software engineering by introducing a novel approach that combines expert guidance with reinforcement learning, potentially transforming how SLMs are utilized in practical applications.
Key Takeaways
- SWE-Protégé improves small language models' performance on software engineering tasks.
- The framework allows SLMs to selectively seek expert guidance, enhancing decision-making.
- A significant performance increase of 25.4% was achieved on SWE-bench using this approach.
Computer Science > Software Engineering arXiv:2602.22124 (cs) [Submitted on 25 Feb 2026] Title:SWE-Protégé: Learning to Selectively Collaborate With an Expert Unlocks Small Language Models as Software Engineering Agents Authors:Patrick Tser Jern Kon, Archana Pradeep, Ang Chen, Alexander P. Ellis, Warren Hunt, Zijian Wang, John Yang, Samuel Thompson View a PDF of the paper titled SWE-Prot\'eg\'e: Learning to Selectively Collaborate With an Expert Unlocks Small Language Models as Software Engineering Agents, by Patrick Tser Jern Kon and 7 other authors View PDF HTML (experimental) Abstract:Small language models (SLMs) offer compelling advantages in cost, latency, and adaptability, but have so far lagged behind larger models on long-horizon software engineering tasks such as SWE-bench, where they suffer from pervasive action looping and low resolution rates. We introduce SWE-Protégé, a post-training framework that reframes software repair as an expert-protégé collaboration problem. In SWE-Protégé, an SLM remains the sole decision-maker while learning to selectively seek guidance from a strong expert model, recognize stalled states, and follow through on expert feedback. Our approach combines supervised fine-tuning on expert-augmented trajectories with agentic reinforcement learning that explicitly discourages degenerative looping and unproductive expert collaboration. We lightly post-train Qwen2.5-Coder-7B-Instruct to achieve 42.4% Pass@1 on SWE-bench Verified, a +25.4% impro...