[2603.03329] AutoHarness: improving LLM agents by automatically synthesizing a code harness
About this article
Abstract page for arXiv paper 2603.03329: AutoHarness: improving LLM agents by automatically synthesizing a code harness
Computer Science > Computation and Language arXiv:2603.03329 (cs) [Submitted on 10 Feb 2026] Title:AutoHarness: improving LLM agents by automatically synthesizing a code harness Authors:Xinghua Lou, Miguel Lázaro-Gredilla, Antoine Dedieu, Carter Wendelken, Wolfgang Lehrach, Kevin P. Murphy View a PDF of the paper titled AutoHarness: improving LLM agents by automatically synthesizing a code harness, by Xinghua Lou and 5 other authors View PDF HTML (experimental) Abstract:Despite significant strides in language models in the last few years, when used as agents, such models often try to perform actions that are not just suboptimal for a given state, but are strictly prohibited by the external environment. For example, in the recent Kaggle GameArena chess competition, 78% of Gemini-2.5-Flash losses were attributed to illegal moves. Often people manually write "harnesses" around LLMs to prevent such failures. In this paper, we demonstrate that Gemini-2.5-Flash can automatically synthesize such a code harness, using a small number of rounds of iterative code refinement given feedback from the (game) environment. The resulting harness prevents all illegal moves in 145 different TextArena games (both 1-player and 2-player), enabling the smaller Gemini-2.5-Flash model to outperform larger models, such as Gemini-2.5-Pro. Pushing our technique to the limit, we can get Gemini-2.5-Flash to generate the entire policy in code, thus eliminating the need to use the LLM at decision making t...