[2602.13691] PhGPO: Pheromone-Guided Policy Optimization for Long-Horizon Tool Planning
Summary
The paper presents PhGPO, a novel approach for long-horizon tool planning that utilizes pheromone-guided policy optimization to enhance the efficiency of tool-use paths in AI agents.
Why It Matters
As AI agents increasingly rely on complex tool planning, PhGPO addresses the challenge of combinatorial explosion in exploration spaces. By leveraging historical trajectories, this method enhances learning efficiency and effectiveness, making it significant for advancing AI capabilities in practical applications.
Key Takeaways
- PhGPO improves long-horizon tool planning by learning from historical trajectories.
- The method uses pheromone-like guidance to optimize policy transitions.
- Experimental results validate the effectiveness of PhGPO in enhancing AI agent performance.
Computer Science > Artificial Intelligence arXiv:2602.13691 (cs) [Submitted on 14 Feb 2026] Title:PhGPO: Pheromone-Guided Policy Optimization for Long-Horizon Tool Planning Authors:Yu Li, Guangfeng Cai, Shengtian Yang, Han Luo, Shuo Han, Xu He, Dong Li, Lei Feng View a PDF of the paper titled PhGPO: Pheromone-Guided Policy Optimization for Long-Horizon Tool Planning, by Yu Li and 7 other authors View PDF HTML (experimental) Abstract:Recent advancements in Large Language Model (LLM) agents have demonstrated strong capabilities in executing complex tasks through tool use. However, long-horizon multi-step tool planning is challenging, because the exploration space suffers from a combinatorial explosion. In this scenario, even when a correct tool-use path is found, it is usually considered an immediate reward for current training, which would not provide any reusable information for subsequent training. In this paper, we argue that historically successful trajectories contain reusable tool-transition patterns, which can be leveraged throughout the whole training process. Inspired by ant colony optimization where historically successful paths can be reflected by the pheromone, we propose Pheromone-Guided Policy Optimization (PhGPO), which learns a trajectory-based transition pattern (i.e., pheromone) from historical trajectories and then uses the learned pheromone to guide policy optimization. This learned pheromone provides explicit and reusable guidance that steers policy opt...