[2507.19457] GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning
Summary
The paper introduces GEPA, a novel prompt optimizer that leverages natural language reflection to enhance learning efficiency in large language models, outperforming traditional reinforcement learning methods.
Why It Matters
As AI systems increasingly rely on reinforcement learning for task adaptation, GEPA presents a significant advancement by demonstrating that natural language can serve as a more effective learning medium. This could lead to more efficient AI training processes and better performance in various applications.
Key Takeaways
- GEPA outperforms traditional reinforcement learning methods by 6% on average.
- It requires up to 35 times fewer rollouts to achieve significant quality gains.
- Natural language reflection is a more effective learning medium than sparse rewards.
- GEPA shows promising results in code optimization tasks.
- The code for GEPA is publicly available, encouraging further research and application.
Computer Science > Computation and Language arXiv:2507.19457 (cs) [Submitted on 25 Jul 2025 (v1), last revised 14 Feb 2026 (this version, v2)] Title:GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning Authors:Lakshya A Agrawal, Shangyin Tan, Dilara Soylu, Noah Ziems, Rishi Khare, Krista Opsahl-Ong, Arnav Singhvi, Herumb Shandilya, Michael J Ryan, Meng Jiang, Christopher Potts, Koushik Sen, Alexandros G. Dimakis, Ion Stoica, Dan Klein, Matei Zaharia, Omar Khattab View a PDF of the paper titled GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning, by Lakshya A Agrawal and 16 other authors View PDF Abstract:Large language models (LLMs) are increasingly adapted to downstream tasks via reinforcement learning (RL) methods like Group Relative Policy Optimization (GRPO), which often require thousands of rollouts to learn new tasks. We argue that the interpretable nature of language often provides a much richer learning medium for LLMs, compared to policy gradients derived from sparse, scalar rewards. To test this, we introduce GEPA (Genetic-Pareto), a prompt optimizer that thoroughly incorporates natural language reflection to learn high-level rules from trial and error. Given any AI system containing one or more LLM prompts, GEPA samples trajectories (e.g., reasoning, tool calls, and tool outputs) and reflects on them in natural language to diagnose problems, propose and test prompt updates, and combine complementary lessons from the Pareto ...