[2602.21320] Tool-R0: Self-Evolving LLM Agents for Tool-Learning from Zero Data

[2602.21320] Tool-R0: Self-Evolving LLM Agents for Tool-Learning from Zero Data

arXiv - Machine Learning 4 min read Article

Summary

The paper presents Tool-R0, a framework for training self-evolving LLM agents capable of tool-learning without prior data, showcasing significant performance improvements through self-play reinforcement learning.

Why It Matters

This research addresses the limitations of traditional reinforcement learning methods that require extensive human supervision and predefined tasks. By enabling LLMs to learn and evolve autonomously, it paves the way for more advanced AI systems capable of adapting to complex, real-world scenarios without prior data, which is crucial for the future of AI development.

Key Takeaways

  • Tool-R0 enables LLMs to learn tool-use from scratch without pre-existing data.
  • The framework utilizes self-play reinforcement learning for continuous evolution.
  • Empirical evaluations show a 92.5% improvement over baseline models.
  • Co-evolution of Generator and Solver enhances task-solving capabilities.
  • Insights into curriculum dynamics and scaling behavior are provided.

Computer Science > Machine Learning arXiv:2602.21320 (cs) [Submitted on 24 Feb 2026] Title:Tool-R0: Self-Evolving LLM Agents for Tool-Learning from Zero Data Authors:Emre Can Acikgoz, Cheng Qian, Jonas Hübotter, Heng Ji, Dilek Hakkani-Tür, Gokhan Tur View a PDF of the paper titled Tool-R0: Self-Evolving LLM Agents for Tool-Learning from Zero Data, by Emre Can Acikgoz and 5 other authors View PDF HTML (experimental) Abstract:Large language models (LLMs) are becoming the foundation for autonomous agents that can use tools to solve complex tasks. Reinforcement learning (RL) has emerged as a common approach for injecting such agentic capabilities, but typically under tightly controlled training setups. It often depends on carefully constructed task-solution pairs and substantial human supervision, which creates a fundamental obstacle to open-ended self-evolution toward superintelligent systems. In this paper, we propose Tool-R0 framework for training general purpose tool-calling agents from scratch with self-play RL, under a zero-data assumption. Initialized from the same base LLM, Tool-R0 co-evolves a Generator and a Solver with complementary rewards: one proposes targeted challenging tasks at the other's competence frontier and the other learns to solve them with real-world tool calls. This creates a self-evolving cycle that requires no pre-existing tasks or datasets. Evaluation on different tool-use benchmarks show that Tool-R0 yields 92.5 relative improvement over the base...

Related Articles

Llms

An attack class that passes every current LLM filter - no payload, no injection signature, no log trace

https://shapingrooms.com/research I published a paper today on something I've been calling postural manipulation. The short version: ordi...

Reddit - Artificial Intelligence · 1 min ·
Llms

[R] An attack class that passes every current LLM filter - no payload, no injection signature, no log trace

https://shapingrooms.com/research I've been documenting what I'm calling postural manipulation: a specific class of language that install...

Reddit - Machine Learning · 1 min ·
There are more AI health tools than ever—but how well do they work? | MIT Technology Review
Llms

There are more AI health tools than ever—but how well do they work? | MIT Technology Review

Earlier this month, Microsoft launched Copilot Health, a new space within its Copilot app where users will be able to connect their medic...

MIT Technology Review · 11 min ·
Llms

What does Gemini think of you?

I noticed that Gemini was referring back to a lot of queries I've made in the past and was using that knowledge to drive follow up prompt...

Reddit - Artificial Intelligence · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime