[2507.19457] GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning

[2507.19457] GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning

arXiv - Machine Learning 4 min read Article

Summary

The paper introduces GEPA, a novel prompt optimizer that leverages natural language reflection to enhance learning efficiency in large language models, outperforming traditional reinforcement learning methods.

Why It Matters

As AI systems increasingly rely on reinforcement learning for task adaptation, GEPA presents a significant advancement by demonstrating that natural language can serve as a more effective learning medium. This could lead to more efficient AI training processes and better performance in various applications.

Key Takeaways

  • GEPA outperforms traditional reinforcement learning methods by 6% on average.
  • It requires up to 35 times fewer rollouts to achieve significant quality gains.
  • Natural language reflection is a more effective learning medium than sparse rewards.
  • GEPA shows promising results in code optimization tasks.
  • The code for GEPA is publicly available, encouraging further research and application.

Computer Science > Computation and Language arXiv:2507.19457 (cs) [Submitted on 25 Jul 2025 (v1), last revised 14 Feb 2026 (this version, v2)] Title:GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning Authors:Lakshya A Agrawal, Shangyin Tan, Dilara Soylu, Noah Ziems, Rishi Khare, Krista Opsahl-Ong, Arnav Singhvi, Herumb Shandilya, Michael J Ryan, Meng Jiang, Christopher Potts, Koushik Sen, Alexandros G. Dimakis, Ion Stoica, Dan Klein, Matei Zaharia, Omar Khattab View a PDF of the paper titled GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning, by Lakshya A Agrawal and 16 other authors View PDF Abstract:Large language models (LLMs) are increasingly adapted to downstream tasks via reinforcement learning (RL) methods like Group Relative Policy Optimization (GRPO), which often require thousands of rollouts to learn new tasks. We argue that the interpretable nature of language often provides a much richer learning medium for LLMs, compared to policy gradients derived from sparse, scalar rewards. To test this, we introduce GEPA (Genetic-Pareto), a prompt optimizer that thoroughly incorporates natural language reflection to learn high-level rules from trial and error. Given any AI system containing one or more LLM prompts, GEPA samples trajectories (e.g., reasoning, tool calls, and tool outputs) and reflects on them in natural language to diagnose problems, propose and test prompt updates, and combine complementary lessons from the Pareto ...

Related Articles

Gemini gets major upgrade towards interactive AI learning
Llms

Gemini gets major upgrade towards interactive AI learning

Google has updated its Gemini AI assistant to generate three-dimensional models and live simulations, allowing users to interact with com...

AI News - General · 3 min ·
Llms

8 free AI courses from Anthropic’s Claude platform with certificates

AI News - General ·
It’s finally happened: I’m now worried about AI. And consulting ChatGPT did nothing to allay my fears | Emma Brockes
Llms

It’s finally happened: I’m now worried about AI. And consulting ChatGPT did nothing to allay my fears | Emma Brockes

AI Tools & Products · 5 min ·
I matched Meta AI against ChatGPT and one clearly lives on the internet more
Llms

I matched Meta AI against ChatGPT and one clearly lives on the internet more

Muse Spark gives Meta AI an eye for what's trending and an instinct to influence

AI Tools & Products · 10 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime