[2603.22293] TIPS: Turn-Level Information-Potential Reward Shaping for Search-Augmented LLMs

[2603.22293] TIPS: Turn-Level Information-Potential Reward Shaping for Search-Augmented LLMs

arXiv - AI 3 min read

About this article

Abstract page for arXiv paper 2603.22293: TIPS: Turn-Level Information-Potential Reward Shaping for Search-Augmented LLMs

Computer Science > Computation and Language arXiv:2603.22293 (cs) [Submitted on 11 Mar 2026] Title:TIPS: Turn-Level Information-Potential Reward Shaping for Search-Augmented LLMs Authors:Yutao Xie, Nathaniel Thomas, Nicklas Hansen, Yang Fu, Li Erran Li, Xiaolong Wang View a PDF of the paper titled TIPS: Turn-Level Information-Potential Reward Shaping for Search-Augmented LLMs, by Yutao Xie and 5 other authors View PDF HTML (experimental) Abstract:Search-augmented large language models (LLMs) trained with reinforcement learning (RL) have achieved strong results on open-domain question answering (QA), but training still remains a significant challenge. The optimization is often unstable due to sparse rewards and difficult credit assignments across reasoning and tool calls. To address this, we introduce Turn-Level Information Potential Reward Shaping (TIPS), a simple framework that assigns dense, turn-level rewards to each reasoning + tool-call segment based on the increased likelihood of the correct answer under a teacher model. By leveraging the potential-based reward shaping, TIPS offers fine-grained and policy-invariant guidance that overcomes the limitations of outcome-only optimization. Evaluated on seven QA benchmarks, TIPS consistently outperforms GRPO/PPO baselines and substantially improves training stability. For instance, with a Qwen-2.5 7B Instruct model, TIPS improves the average Exact Match score by 11.8% and F1 by 13.6% relative to PPO. Our results demonstrate...

Originally published on March 25, 2026. Curated by AI News.

Related Articles

Llms

An attack class that passes every current LLM filter - no payload, no injection signature, no log trace

https://shapingrooms.com/research I published a paper today on something I've been calling postural manipulation. The short version: ordi...

Reddit - Artificial Intelligence · 1 min ·
Llms

[R] An attack class that passes every current LLM filter - no payload, no injection signature, no log trace

https://shapingrooms.com/research I've been documenting what I'm calling postural manipulation: a specific class of language that install...

Reddit - Machine Learning · 1 min ·
Llms

What does Gemini think of you?

I noticed that Gemini was referring back to a lot of queries I've made in the past and was using that knowledge to drive follow up prompt...

Reddit - Artificial Intelligence · 1 min ·
Llms

This app helps you see what LLMs you can run on your hardware

submitted by /u/dev_is_active [link] [comments]

Reddit - Artificial Intelligence · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime