[2602.08316] SWE Context Bench: A Benchmark for Context Learning in

[2602.08316] SWE Context Bench: A Benchmark for Context Learning in Coding

arXiv - AI March 30, 2026 4 min read

About this article

Abstract page for arXiv paper 2602.08316: SWE Context Bench: A Benchmark for Context Learning in Coding

Computer Science > Software Engineering arXiv:2602.08316 (cs) [Submitted on 9 Feb 2026 (v1), last revised 27 Mar 2026 (this version, v2)] Title:SWE Context Bench: A Benchmark for Context Learning in Coding Authors:Jared Zhu, Minhao Hu, Junde Wu View a PDF of the paper titled SWE Context Bench: A Benchmark for Context Learning in Coding, by Jared Zhu and 2 other authors View PDF HTML (experimental) Abstract:Large language models are increasingly used as programming agents for repository level software engineering tasks. While recent benchmarks evaluate correctness in realistic codebases, they largely treat tasks as independent and do not assess whether agents can reuse previous experience or contexts across related problems. As a result, the ability of agents to accumulate, retrieve, and apply prior experience, as well as the efficiency gains from such reuse, remains difficult to measure. We introduce SWE-ContextBench, a benchmark designed to explicitly evaluate context reuse in programming agents. Built on SWE-Bench Lite, SWE-Bench Multilingual, and SWE-Bench Verified, SWE-ContextBench consists of 1,100 base tasks with 376 related tasks derived from real dependency and reference relationships among GitHub issues and pull requests. SWE-ContextBench groups base tasks and related tasks with shared context across 51 unique repositories and 9 programming languages. The benchmark evaluates agents along three complementary dimensions: prediction accuracy, time efficiency, and cos...

Originally published on March 30, 2026. Curated by AI News.

Llms

De-aged casts, ChatGPT-generated programs: How AI is changing Korean TV

Artificial intelligence is transforming every corner of industry, and television is no exception. Major networks in Korea have recently a...

AI Tools & Products · 4 min · about 1 hour ago

Llms

[2603.16629] MLLM-based Textual Explanations for Face Comparison

Abstract page for arXiv paper 2603.16629: MLLM-based Textual Explanations for Face Comparison

arXiv - AI · 4 min · about 2 hours ago

Llms

[2603.15159] To See is Not to Master: Teaching LLMs to Use Private Libraries for Code Generation

Abstract page for arXiv paper 2603.15159: To See is Not to Master: Teaching LLMs to Use Private Libraries for Code Generation

arXiv - AI · 4 min · about 2 hours ago

Llms

[2601.13227] Insider Knowledge: How Much Can RAG Systems Gain from Evaluation Secrets?

Abstract page for arXiv paper 2601.13227: Insider Knowledge: How Much Can RAG Systems Gain from Evaluation Secrets?

arXiv - AI · 3 min · about 2 hours ago

[2602.08316] SWE Context Bench: A Benchmark for Context Learning in Coding

About this article

Related Articles

De-aged casts, ChatGPT-generated programs: How AI is changing Korean TV

[2603.16629] MLLM-based Textual Explanations for Face Comparison

[2603.15159] To See is Not to Master: Teaching LLMs to Use Private Libraries for Code Generation

[2601.13227] Insider Knowledge: How Much Can RAG Systems Gain from Evaluation Secrets?

No comments

Stay updated with AI News