[2603.03329] AutoHarness: improving LLM agents by automatically

[2603.03329] AutoHarness: improving LLM agents by automatically synthesizing a code harness

arXiv - AI March 05, 2026 4 min read

About this article

Abstract page for arXiv paper 2603.03329: AutoHarness: improving LLM agents by automatically synthesizing a code harness

Computer Science > Computation and Language arXiv:2603.03329 (cs) [Submitted on 10 Feb 2026] Title:AutoHarness: improving LLM agents by automatically synthesizing a code harness Authors:Xinghua Lou, Miguel Lázaro-Gredilla, Antoine Dedieu, Carter Wendelken, Wolfgang Lehrach, Kevin P. Murphy View a PDF of the paper titled AutoHarness: improving LLM agents by automatically synthesizing a code harness, by Xinghua Lou and 5 other authors View PDF HTML (experimental) Abstract:Despite significant strides in language models in the last few years, when used as agents, such models often try to perform actions that are not just suboptimal for a given state, but are strictly prohibited by the external environment. For example, in the recent Kaggle GameArena chess competition, 78% of Gemini-2.5-Flash losses were attributed to illegal moves. Often people manually write "harnesses" around LLMs to prevent such failures. In this paper, we demonstrate that Gemini-2.5-Flash can automatically synthesize such a code harness, using a small number of rounds of iterative code refinement given feedback from the (game) environment. The resulting harness prevents all illegal moves in 145 different TextArena games (both 1-player and 2-player), enabling the smaller Gemini-2.5-Flash model to outperform larger models, such as Gemini-2.5-Pro. Pushing our technique to the limit, we can get Gemini-2.5-Flash to generate the entire policy in code, thus eliminating the need to use the LLM at decision making t...

Originally published on March 05, 2026. Curated by AI News.

Llms

What's your "When Language Model AI can do X, I'll be impressed"?

I have two at the top of my mind: When it can read musical notes. I will be mildly impressed when I can paste in a picture of musical not...

Reddit - Artificial Intelligence · 1 min · about 2 hours ago

Llms

Google’s Gemini AI can answer your questions with 3D models and simulations

Google's latest upgrade for Gemini will allow the chatbot to generate interactive 3D models and simulations in response to your questions...

The Verge - AI · 4 min · about 8 hours ago

Llms

Moody’s Integrates AI Agents With Anthropic’s Claude

AI Tools & Products · 4 min · about 8 hours ago

Llms

AI on the couch: Anthropic gives Claude 20 hours of psychiatry

AI Tools & Products · 6 min · about 8 hours ago

[2603.03329] AutoHarness: improving LLM agents by automatically synthesizing a code harness

About this article

Related Articles

What's your "When Language Model AI can do X, I'll be impressed"?

Google’s Gemini AI can answer your questions with 3D models and simulations

Moody’s Integrates AI Agents With Anthropic’s Claude

AI on the couch: Anthropic gives Claude 20 hours of psychiatry

No comments

Stay updated with AI News