[2603.22293] TIPS: Turn-Level Information-Potential Reward Shaping for

[2603.22293] TIPS: Turn-Level Information-Potential Reward Shaping for Search-Augmented LLMs

arXiv - AI March 25, 2026 3 min read

About this article

Abstract page for arXiv paper 2603.22293: TIPS: Turn-Level Information-Potential Reward Shaping for Search-Augmented LLMs

Computer Science > Computation and Language arXiv:2603.22293 (cs) [Submitted on 11 Mar 2026] Title:TIPS: Turn-Level Information-Potential Reward Shaping for Search-Augmented LLMs Authors:Yutao Xie, Nathaniel Thomas, Nicklas Hansen, Yang Fu, Li Erran Li, Xiaolong Wang View a PDF of the paper titled TIPS: Turn-Level Information-Potential Reward Shaping for Search-Augmented LLMs, by Yutao Xie and 5 other authors View PDF HTML (experimental) Abstract:Search-augmented large language models (LLMs) trained with reinforcement learning (RL) have achieved strong results on open-domain question answering (QA), but training still remains a significant challenge. The optimization is often unstable due to sparse rewards and difficult credit assignments across reasoning and tool calls. To address this, we introduce Turn-Level Information Potential Reward Shaping (TIPS), a simple framework that assigns dense, turn-level rewards to each reasoning + tool-call segment based on the increased likelihood of the correct answer under a teacher model. By leveraging the potential-based reward shaping, TIPS offers fine-grained and policy-invariant guidance that overcomes the limitations of outcome-only optimization. Evaluated on seven QA benchmarks, TIPS consistently outperforms GRPO/PPO baselines and substantially improves training stability. For instance, with a Qwen-2.5 7B Instruct model, TIPS improves the average Exact Match score by 11.8% and F1 by 13.6% relative to PPO. Our results demonstrate...

Originally published on March 25, 2026. Curated by AI News.

Llms

An attack class that passes every current LLM filter - no payload, no injection signature, no log trace

https://shapingrooms.com/research I published a paper today on something I've been calling postural manipulation. The short version: ordi...

Reddit - Artificial Intelligence · 1 min · 36 minutes ago

Llms

[R] An attack class that passes every current LLM filter - no payload, no injection signature, no log trace

https://shapingrooms.com/research I've been documenting what I'm calling postural manipulation: a specific class of language that install...

Reddit - Machine Learning · 1 min · about 1 hour ago

Llms

What does Gemini think of you?

I noticed that Gemini was referring back to a lot of queries I've made in the past and was using that knowledge to drive follow up prompt...

Reddit - Artificial Intelligence · 1 min · about 3 hours ago

Llms

This app helps you see what LLMs you can run on your hardware

submitted by /u/dev_is_active [link] [comments]

Reddit - Artificial Intelligence · 1 min · about 3 hours ago

[2603.22293] TIPS: Turn-Level Information-Potential Reward Shaping for Search-Augmented LLMs

About this article

Related Articles

An attack class that passes every current LLM filter - no payload, no injection signature, no log trace

[R] An attack class that passes every current LLM filter - no payload, no injection signature, no log trace

What does Gemini think of you?

This app helps you see what LLMs you can run on your hardware

No comments

Stay updated with AI News