[2602.14697] Evolutionary System Prompt Learning can Facilitate Reinforcement Learning for LLMs

[2602.14697] Evolutionary System Prompt Learning can Facilitate Reinforcement Learning for LLMs

arXiv - Machine Learning 4 min read Article

Summary

The paper proposes Evolutionary System Prompt Learning (E-SPL) to enhance reinforcement learning in large language models (LLMs) by evolving system prompts alongside model weights, improving performance in reasoning tasks.

Why It Matters

This research is significant as it addresses the challenge of enhancing LLMs' self-improvement capabilities. By integrating evolutionary strategies with reinforcement learning, it offers a novel approach to boost sample efficiency and generalization, which are critical for advancing AI applications.

Key Takeaways

  • E-SPL combines reinforcement learning with evolutionary strategies for LLMs.
  • The method improves both model context and weights simultaneously.
  • E-SPL enhances performance in reasoning tasks and generalization.
  • The approach shows significant gains in sample efficiency.
  • E-SPL outperforms traditional reflective prompt evolution methods.

Computer Science > Artificial Intelligence arXiv:2602.14697 (cs) [Submitted on 16 Feb 2026] Title:Evolutionary System Prompt Learning can Facilitate Reinforcement Learning for LLMs Authors:Lunjun Zhang, Ryan Chen, Bradly C. Stadie View a PDF of the paper titled Evolutionary System Prompt Learning can Facilitate Reinforcement Learning for LLMs, by Lunjun Zhang and 2 other authors View PDF Abstract:Building agentic systems that can autonomously self-improve from experience is a longstanding goal of AI. Large language models (LLMs) today primarily self-improve via two mechanisms: self-reflection for context updates, and reinforcement learning (RL) for weight updates. In this work, we propose Evolutionary System Prompt Learning (E-SPL), a method for jointly improving model contexts and model weights. In each RL iteration, E-SPL selects multiple system prompts and runs rollouts with each in parallel. It applies RL updates to model weights conditioned on each system prompt, and evolutionary updates to the system prompt population via LLM-driven mutation and crossover. Each system prompt has a TrueSkill rating for evolutionary selection, updated from relative performance within each RL iteration batch. E-SPL encourages a natural division between declarative knowledge encoded in prompts and procedural knowledge encoded in weights, resulting in improved performance across reasoning and agentic tasks. For instance, in an easy-to-hard (AIME $\rightarrow$ BeyondAIME) generalization se...

Related Articles

Llms

AI Has Broken the Internet

So the web has been breaking a lot lately. Vercel is down. GitHub is down. Claude is down. Cloudflare is down. AWS is down. Everything is...

Reddit - Artificial Intelligence · 1 min ·
Llms

LLM agents can trigger real actions now. But what actually stops them from executing?

We ran into a simple but important issue while building agents with tool calling: the model can propose actions but nothing actually enfo...

Reddit - Artificial Intelligence · 1 min ·
Llms

Are LLMs a Dead End? (Investors Just Bet $1 Billion on “Yes”)

| AI Reality Check | Cal Newport Chapters 0:00 What is Yan LeCun Up To? 14:55 How is it possible that LeCun could be right about LLM’s be...

Reddit - Artificial Intelligence · 1 min ·
Mercor says it was hit by cyberattack tied to compromise of open-source LiteLLM project | TechCrunch
Llms

Mercor says it was hit by cyberattack tied to compromise of open-source LiteLLM project | TechCrunch

The AI recruiting startup confirmed a security incident after an extortion hacking crew took credit for stealing data from the company's ...

TechCrunch - AI · 4 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime