[2602.22576] Search-P1: Path-Centric Reward Shaping for Stable and Efficient Agentic RAG Training

[2602.22576] Search-P1: Path-Centric Reward Shaping for Stable and Efficient Agentic RAG Training

arXiv - Machine Learning 3 min read Article

Summary

The paper presents Search-P1, a framework for enhancing Retrieval-Augmented Generation (RAG) training through path-centric reward shaping, improving reasoning accuracy in large language models (LLMs).

Why It Matters

This research addresses the limitations of traditional RAG methods by introducing a more efficient training framework that enhances the performance of LLMs in complex reasoning tasks. The findings could significantly impact the development of AI systems that require reliable multi-step reasoning capabilities, making it relevant for researchers and practitioners in AI and machine learning.

Key Takeaways

  • Search-P1 introduces path-centric reward shaping to improve RAG training.
  • The framework allows LLMs to learn from both successful and failed reasoning attempts.
  • Experiments show an average accuracy improvement of 7.7 points over existing methods.

Computer Science > Computation and Language arXiv:2602.22576 (cs) [Submitted on 26 Feb 2026] Title:Search-P1: Path-Centric Reward Shaping for Stable and Efficient Agentic RAG Training Authors:Tianle Xia, Ming Xu, Lingxiang Hu, Yiding Sun, Wenwei Li, Linfang Shang, Liqun Liu, Peng Shu, Huan Yu, Jie Jiang View a PDF of the paper titled Search-P1: Path-Centric Reward Shaping for Stable and Efficient Agentic RAG Training, by Tianle Xia and 9 other authors View PDF HTML (experimental) Abstract:Retrieval-Augmented Generation (RAG) enhances large language models (LLMs) by incorporating external knowledge, yet traditional single-round retrieval struggles with complex multi-step reasoning. Agentic RAG addresses this by enabling LLMs to dynamically decide when and what to retrieve, but current RL-based training methods suffer from sparse outcome rewards that discard intermediate signals and low sample efficiency where failed samples contribute nothing. We propose Search-P1, a framework that introduces path-centric reward shaping for agentic RAG training, comprising two key components: (1) Path-Centric Reward, which evaluates the structural quality of reasoning trajectories through order-agnostic step coverage and soft scoring that extracts learning signals even from failed samples, and (2) Dual-Track Path Scoring with offline-generated reference planners that assesses paths from both self-consistency and reference-alignment perspectives. Experiments on multiple QA benchmarks demonst...

Related Articles

[2603.18532] Scaling Sim-to-Real Reinforcement Learning for Robot VLAs with Generative 3D Worlds
Llms

[2603.18532] Scaling Sim-to-Real Reinforcement Learning for Robot VLAs with Generative 3D Worlds

Abstract page for arXiv paper 2603.18532: Scaling Sim-to-Real Reinforcement Learning for Robot VLAs with Generative 3D Worlds

arXiv - Machine Learning · 4 min ·
[2603.12702] FGTR: Fine-Grained Multi-Table Retrieval via Hierarchical LLM Reasoning
Llms

[2603.12702] FGTR: Fine-Grained Multi-Table Retrieval via Hierarchical LLM Reasoning

Abstract page for arXiv paper 2603.12702: FGTR: Fine-Grained Multi-Table Retrieval via Hierarchical LLM Reasoning

arXiv - Machine Learning · 4 min ·
[2603.12681] Colluding LoRA: A Compositional Vulnerability in LLM Safety Alignment
Llms

[2603.12681] Colluding LoRA: A Compositional Vulnerability in LLM Safety Alignment

Abstract page for arXiv paper 2603.12681: Colluding LoRA: A Compositional Vulnerability in LLM Safety Alignment

arXiv - Machine Learning · 3 min ·
[2602.06098] A Theoretical Analysis of Test-Driven LLM Code Generation
Llms

[2602.06098] A Theoretical Analysis of Test-Driven LLM Code Generation

Abstract page for arXiv paper 2602.06098: A Theoretical Analysis of Test-Driven LLM Code Generation

arXiv - Machine Learning · 3 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime