[2602.21320] Tool-R0: Self-Evolving LLM Agents for Tool-Learning from Zero Data
Summary
The paper presents Tool-R0, a framework for training self-evolving LLM agents capable of tool-learning without prior data, showcasing significant performance improvements through self-play reinforcement learning.
Why It Matters
This research addresses the limitations of traditional reinforcement learning methods that require extensive human supervision and predefined tasks. By enabling LLMs to learn and evolve autonomously, it paves the way for more advanced AI systems capable of adapting to complex, real-world scenarios without prior data, which is crucial for the future of AI development.
Key Takeaways
- Tool-R0 enables LLMs to learn tool-use from scratch without pre-existing data.
- The framework utilizes self-play reinforcement learning for continuous evolution.
- Empirical evaluations show a 92.5% improvement over baseline models.
- Co-evolution of Generator and Solver enhances task-solving capabilities.
- Insights into curriculum dynamics and scaling behavior are provided.
Computer Science > Machine Learning arXiv:2602.21320 (cs) [Submitted on 24 Feb 2026] Title:Tool-R0: Self-Evolving LLM Agents for Tool-Learning from Zero Data Authors:Emre Can Acikgoz, Cheng Qian, Jonas Hübotter, Heng Ji, Dilek Hakkani-Tür, Gokhan Tur View a PDF of the paper titled Tool-R0: Self-Evolving LLM Agents for Tool-Learning from Zero Data, by Emre Can Acikgoz and 5 other authors View PDF HTML (experimental) Abstract:Large language models (LLMs) are becoming the foundation for autonomous agents that can use tools to solve complex tasks. Reinforcement learning (RL) has emerged as a common approach for injecting such agentic capabilities, but typically under tightly controlled training setups. It often depends on carefully constructed task-solution pairs and substantial human supervision, which creates a fundamental obstacle to open-ended self-evolution toward superintelligent systems. In this paper, we propose Tool-R0 framework for training general purpose tool-calling agents from scratch with self-play RL, under a zero-data assumption. Initialized from the same base LLM, Tool-R0 co-evolves a Generator and a Solver with complementary rewards: one proposes targeted challenging tasks at the other's competence frontier and the other learns to solve them with real-world tool calls. This creates a self-evolving cycle that requires no pre-existing tasks or datasets. Evaluation on different tool-use benchmarks show that Tool-R0 yields 92.5 relative improvement over the base...