[2602.08316] SWE Context Bench: A Benchmark for Context Learning in Coding
About this article
Abstract page for arXiv paper 2602.08316: SWE Context Bench: A Benchmark for Context Learning in Coding
Computer Science > Software Engineering arXiv:2602.08316 (cs) [Submitted on 9 Feb 2026 (v1), last revised 27 Mar 2026 (this version, v2)] Title:SWE Context Bench: A Benchmark for Context Learning in Coding Authors:Jared Zhu, Minhao Hu, Junde Wu View a PDF of the paper titled SWE Context Bench: A Benchmark for Context Learning in Coding, by Jared Zhu and 2 other authors View PDF HTML (experimental) Abstract:Large language models are increasingly used as programming agents for repository level software engineering tasks. While recent benchmarks evaluate correctness in realistic codebases, they largely treat tasks as independent and do not assess whether agents can reuse previous experience or contexts across related problems. As a result, the ability of agents to accumulate, retrieve, and apply prior experience, as well as the efficiency gains from such reuse, remains difficult to measure. We introduce SWE-ContextBench, a benchmark designed to explicitly evaluate context reuse in programming agents. Built on SWE-Bench Lite, SWE-Bench Multilingual, and SWE-Bench Verified, SWE-ContextBench consists of 1,100 base tasks with 376 related tasks derived from real dependency and reference relationships among GitHub issues and pull requests. SWE-ContextBench groups base tasks and related tasks with shared context across 51 unique repositories and 9 programming languages. The benchmark evaluates agents along three complementary dimensions: prediction accuracy, time efficiency, and cos...