[2602.16805] Simple Baselines are Competitive with Code Evolution
Summary
The paper examines the effectiveness of simple baselines in code evolution, demonstrating that they can match or outperform more complex methods across various domains.
Why It Matters
This research highlights the potential of simpler approaches in code evolution, challenging the reliance on sophisticated methods. It emphasizes the importance of search space design and evaluation methods, which could lead to more efficient and effective programming solutions in AI.
Key Takeaways
- Simple baselines can outperform complex code evolution methods.
- Search space design and domain knowledge are critical for performance.
- High variance in scaffolds can lead to suboptimal results in agentic designs.
- Better evaluation methods are needed for effective code evolution.
- The study suggests avenues for improving future code evolution practices.
Computer Science > Artificial Intelligence arXiv:2602.16805 (cs) [Submitted on 18 Feb 2026] Title:Simple Baselines are Competitive with Code Evolution Authors:Yonatan Gideoni, Sebastian Risi, Yarin Gal View a PDF of the paper titled Simple Baselines are Competitive with Code Evolution, by Yonatan Gideoni and 2 other authors View PDF HTML (experimental) Abstract:Code evolution is a family of techniques that rely on large language models to search through possible computer programs by evolving or mutating existing code. Many proposed code evolution pipelines show impressive performance but are often not compared to simpler baselines. We test how well two simple baselines do over three domains: finding better mathematical bounds, designing agentic scaffolds, and machine learning competitions. We find that simple baselines match or exceed much more sophisticated methods in all three. By analyzing these results we find various shortcomings in how code evolution is both developed and used. For the mathematical bounds, a problem's search space and domain knowledge in the prompt are chiefly what dictate a search's performance ceiling and efficiency, with the code evolution pipeline being secondary. Thus, the primary challenge in finding improved bounds is designing good search spaces, which is done by domain experts, and not the search itself. When designing agentic scaffolds we find that high variance in the scaffolds coupled with small datasets leads to suboptimal scaffolds bein...