Llms Machine Learning Robotics Ai Agents

[2602.14697] Evolutionary System Prompt Learning can Facilitate Reinforcement Learning for LLMs

arXiv - Machine Learning February 17, 2026 4 min read Article

Summary

The paper proposes Evolutionary System Prompt Learning (E-SPL) to enhance reinforcement learning in large language models (LLMs) by evolving system prompts alongside model weights, improving performance in reasoning tasks.

Why It Matters

This research is significant as it addresses the challenge of enhancing LLMs' self-improvement capabilities. By integrating evolutionary strategies with reinforcement learning, it offers a novel approach to boost sample efficiency and generalization, which are critical for advancing AI applications.

Key Takeaways

E-SPL combines reinforcement learning with evolutionary strategies for LLMs.
The method improves both model context and weights simultaneously.
E-SPL enhances performance in reasoning tasks and generalization.
The approach shows significant gains in sample efficiency.
E-SPL outperforms traditional reflective prompt evolution methods.

Computer Science > Artificial Intelligence arXiv:2602.14697 (cs) [Submitted on 16 Feb 2026] Title:Evolutionary System Prompt Learning can Facilitate Reinforcement Learning for LLMs Authors:Lunjun Zhang, Ryan Chen, Bradly C. Stadie View a PDF of the paper titled Evolutionary System Prompt Learning can Facilitate Reinforcement Learning for LLMs, by Lunjun Zhang and 2 other authors View PDF Abstract:Building agentic systems that can autonomously self-improve from experience is a longstanding goal of AI. Large language models (LLMs) today primarily self-improve via two mechanisms: self-reflection for context updates, and reinforcement learning (RL) for weight updates. In this work, we propose Evolutionary System Prompt Learning (E-SPL), a method for jointly improving model contexts and model weights. In each RL iteration, E-SPL selects multiple system prompts and runs rollouts with each in parallel. It applies RL updates to model weights conditioned on each system prompt, and evolutionary updates to the system prompt population via LLM-driven mutation and crossover. Each system prompt has a TrueSkill rating for evolutionary selection, updated from relative performance within each RL iteration batch. E-SPL encourages a natural division between declarative knowledge encoded in prompts and procedural knowledge encoded in weights, resulting in improved performance across reasoning and agentic tasks. For instance, in an easy-to-hard (AIME $\rightarrow$ BeyondAIME) generalization se...

Read Original Article

[2602.14697] Evolutionary System Prompt Learning can Facilitate Reinforcement Learning for LLMs

Summary

Why It Matters

Key Takeaways

Related Articles

AI Has Broken the Internet

LLM agents can trigger real actions now. But what actually stops them from executing?

Are LLMs a Dead End? (Investors Just Bet $1 Billion on “Yes”)

Mercor says it was hit by cyberattack tied to compromise of open-source LiteLLM project | TechCrunch

No comments

Stay updated with AI News