Llms Machine Learning Ai Agents Ai Infrastructure

[2602.16805] Simple Baselines are Competitive with Code Evolution

arXiv - Machine Learning February 20, 2026 4 min read Article

Summary

The paper examines the effectiveness of simple baselines in code evolution, demonstrating that they can match or outperform more complex methods across various domains.

Why It Matters

This research highlights the potential of simpler approaches in code evolution, challenging the reliance on sophisticated methods. It emphasizes the importance of search space design and evaluation methods, which could lead to more efficient and effective programming solutions in AI.

Key Takeaways

Simple baselines can outperform complex code evolution methods.
Search space design and domain knowledge are critical for performance.
High variance in scaffolds can lead to suboptimal results in agentic designs.
Better evaluation methods are needed for effective code evolution.
The study suggests avenues for improving future code evolution practices.

Computer Science > Artificial Intelligence arXiv:2602.16805 (cs) [Submitted on 18 Feb 2026] Title:Simple Baselines are Competitive with Code Evolution Authors:Yonatan Gideoni, Sebastian Risi, Yarin Gal View a PDF of the paper titled Simple Baselines are Competitive with Code Evolution, by Yonatan Gideoni and 2 other authors View PDF HTML (experimental) Abstract:Code evolution is a family of techniques that rely on large language models to search through possible computer programs by evolving or mutating existing code. Many proposed code evolution pipelines show impressive performance but are often not compared to simpler baselines. We test how well two simple baselines do over three domains: finding better mathematical bounds, designing agentic scaffolds, and machine learning competitions. We find that simple baselines match or exceed much more sophisticated methods in all three. By analyzing these results we find various shortcomings in how code evolution is both developed and used. For the mathematical bounds, a problem's search space and domain knowledge in the prompt are chiefly what dictate a search's performance ceiling and efficiency, with the code evolution pipeline being secondary. Thus, the primary challenge in finding improved bounds is designing good search spaces, which is done by domain experts, and not the search itself. When designing agentic scaffolds we find that high variance in the scaffolds coupled with small datasets leads to suboptimal scaffolds bein...

Read Original Article

Llms

I can't help rooting for tiny open source AI model maker Arcee | TechCrunch

Arcee is a tiny 26-person U.S. startup that built a high-performing, massive, open source LLM. And it's gaining popularity with OpenClaw ...

TechCrunch - AI · 4 min · 38 minutes ago

Llms

Anthropic Teams Up With Its Rivals to Keep AI From Hacking Everything | WIRED

The AI lab's Project Glasswing will bring together Apple, Google, and more than 45 other organizations. They'll use the new Claude Mythos...

Wired - AI · 7 min · about 3 hours ago

Llms

The public needs to control AI-run infrastructure, labor, education, and governance— NOT private actors

A lot of discussion around AI is becoming siloed, and I think that is dangerous. People in AI-focused spaces often talk as if the only qu...

Reddit - Artificial Intelligence · 1 min · about 3 hours ago

Llms

Agents that write their own code at runtime and vote on capabilities, no human in the loop

hollowOS just hit v4.4 and I added something that I haven’t seen anyone else do. Previous versions gave you an OS for agents: structured ...

Reddit - Artificial Intelligence · 1 min · about 6 hours ago

[2602.16805] Simple Baselines are Competitive with Code Evolution

Summary

Why It Matters

Key Takeaways

Related Articles

I can't help rooting for tiny open source AI model maker Arcee | TechCrunch

Anthropic Teams Up With Its Rivals to Keep AI From Hacking Everything | WIRED

The public needs to control AI-run infrastructure, labor, education, and governance— NOT private actors

Agents that write their own code at runtime and vote on capabilities, no human in the loop

No comments

Stay updated with AI News