[2603.24755] SlopCodeBench: Benchmarking How Coding Agents Degrade Over Long-Horizon Iterative Tasks

[2603.24755] SlopCodeBench: Benchmarking How Coding Agents Degrade Over Long-Horizon Iterative Tasks

arXiv - AI 4 min read

About this article

Abstract page for arXiv paper 2603.24755: SlopCodeBench: Benchmarking How Coding Agents Degrade Over Long-Horizon Iterative Tasks

Computer Science > Software Engineering arXiv:2603.24755 (cs) [Submitted on 25 Mar 2026] Title:SlopCodeBench: Benchmarking How Coding Agents Degrade Over Long-Horizon Iterative Tasks Authors:Gabriel Orlanski, Devjeet Roy, Alexander Yun, Changho Shin, Alex Gu, Albert Ge, Dyah Adila, Frederic Sala, Aws Albarghouthi View a PDF of the paper titled SlopCodeBench: Benchmarking How Coding Agents Degrade Over Long-Horizon Iterative Tasks, by Gabriel Orlanski and 8 other authors View PDF HTML (experimental) Abstract:Software development is iterative, yet agentic coding benchmarks overwhelmingly evaluate single-shot solutions against complete specifications. Code can pass the test suite but become progressively harder to extend. Recent iterative benchmarks attempt to close this gap, but constrain the agent's design decisions too tightly to faithfully measure how code quality shapes future extensions. We introduce SlopCodeBench, a language-agnostic benchmark comprising 20 problems and 93 checkpoints, in which agents repeatedly extend their own prior solutions under evolving specifications that force architectural decisions without prescribing internal structure. We track two trajectory-level quality signals: verbosity, the fraction of redundant or duplicated code, and structural erosion, the share of complexity mass concentrated in high-complexity functions. No agent solves any problem end-to-end across 11 models; the highest checkpoint solve rate is 17.2%. Quality degrades steadily:...

Originally published on March 27, 2026. Curated by AI News.

Related Articles

Ai Agents

AI agent accelerates catalyst discovery for sustainable fuel development

A multi-institutional team based in China recently used AI to identify a key characteristic of compounds called catalysts that are used t...

Reddit - Artificial Intelligence · 1 min ·
[2603.10030] The DMA Streaming Framework: Kernel-Level Buffer Orchestration for High-Performance AI Data Paths
Ai Agents

[2603.10030] The DMA Streaming Framework: Kernel-Level Buffer Orchestration for High-Performance AI Data Paths

Abstract page for arXiv paper 2603.10030: The DMA Streaming Framework: Kernel-Level Buffer Orchestration for High-Performance AI Data Paths

arXiv - AI · 3 min ·
[2506.12104] DRIFT: Dynamic Rule-Based Defense with Injection Isolation for Securing LLM Agents
Llms

[2506.12104] DRIFT: Dynamic Rule-Based Defense with Injection Isolation for Securing LLM Agents

Abstract page for arXiv paper 2506.12104: DRIFT: Dynamic Rule-Based Defense with Injection Isolation for Securing LLM Agents

arXiv - AI · 4 min ·
[2603.24402] AI-Supervisor: Autonomous AI Research Supervision via a Persistent Research World Model
Machine Learning

[2603.24402] AI-Supervisor: Autonomous AI Research Supervision via a Persistent Research World Model

Abstract page for arXiv paper 2603.24402: AI-Supervisor: Autonomous AI Research Supervision via a Persistent Research World Model

arXiv - AI · 4 min ·
More in Ai Agents: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime