[2602.13691] PhGPO: Pheromone-Guided Policy Optimization for Long-Horizon Tool Planning

[2602.13691] PhGPO: Pheromone-Guided Policy Optimization for Long-Horizon Tool Planning

arXiv - AI 3 min read Article

Summary

The paper presents PhGPO, a novel approach for long-horizon tool planning that utilizes pheromone-guided policy optimization to enhance the efficiency of tool-use paths in AI agents.

Why It Matters

As AI agents increasingly rely on complex tool planning, PhGPO addresses the challenge of combinatorial explosion in exploration spaces. By leveraging historical trajectories, this method enhances learning efficiency and effectiveness, making it significant for advancing AI capabilities in practical applications.

Key Takeaways

  • PhGPO improves long-horizon tool planning by learning from historical trajectories.
  • The method uses pheromone-like guidance to optimize policy transitions.
  • Experimental results validate the effectiveness of PhGPO in enhancing AI agent performance.

Computer Science > Artificial Intelligence arXiv:2602.13691 (cs) [Submitted on 14 Feb 2026] Title:PhGPO: Pheromone-Guided Policy Optimization for Long-Horizon Tool Planning Authors:Yu Li, Guangfeng Cai, Shengtian Yang, Han Luo, Shuo Han, Xu He, Dong Li, Lei Feng View a PDF of the paper titled PhGPO: Pheromone-Guided Policy Optimization for Long-Horizon Tool Planning, by Yu Li and 7 other authors View PDF HTML (experimental) Abstract:Recent advancements in Large Language Model (LLM) agents have demonstrated strong capabilities in executing complex tasks through tool use. However, long-horizon multi-step tool planning is challenging, because the exploration space suffers from a combinatorial explosion. In this scenario, even when a correct tool-use path is found, it is usually considered an immediate reward for current training, which would not provide any reusable information for subsequent training. In this paper, we argue that historically successful trajectories contain reusable tool-transition patterns, which can be leveraged throughout the whole training process. Inspired by ant colony optimization where historically successful paths can be reflected by the pheromone, we propose Pheromone-Guided Policy Optimization (PhGPO), which learns a trajectory-based transition pattern (i.e., pheromone) from historical trajectories and then uses the learned pheromone to guide policy optimization. This learned pheromone provides explicit and reusable guidance that steers policy opt...

Related Articles

Anthropic’s Unreleased Claude Mythos Might Be The Most Advanced AI Model Yet
Llms

Anthropic’s Unreleased Claude Mythos Might Be The Most Advanced AI Model Yet

Anthropic is testing an unreleased artificial intelligence (AI) model with capabilities that exceed any system it has previously released...

AI Tools & Products · 5 min ·
Anthropic leaks part of Claude Code's internal source code
Llms

Anthropic leaks part of Claude Code's internal source code

Claude Code has seen massive adoption over the last year, and its run-rate revenue had swelled to more than $2.5 billion as of February.

AI Tools & Products · 3 min ·
Australian government and Anthropic sign MOU for AI safety and research
Llms

Australian government and Anthropic sign MOU for AI safety and research

Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.

AI Tools & Products · 5 min ·
Penguin to sue OpenAI over ChatGPT version of German children’s book
Llms

Penguin to sue OpenAI over ChatGPT version of German children’s book

Publisher alleges AI research company’s chatbot violated its copyright over Coconut the Little Dragon series

AI Tools & Products · 3 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime