[2603.19310] MemReward: Graph-Based Experience Memory for LLM Reward

[2603.19310] MemReward: Graph-Based Experience Memory for LLM Reward Prediction with Limited Labels

arXiv - AI March 23, 2026 4 min read

About this article

Abstract page for arXiv paper 2603.19310: MemReward: Graph-Based Experience Memory for LLM Reward Prediction with Limited Labels

Computer Science > Machine Learning arXiv:2603.19310 (cs) [Submitted on 13 Mar 2026] Title:MemReward: Graph-Based Experience Memory for LLM Reward Prediction with Limited Labels Authors:Tianyang Luo, Tao Feng, Zhigang Hua, Yan Xie, Shuang Yang, Ge Liu, Jiaxuan You View a PDF of the paper titled MemReward: Graph-Based Experience Memory for LLM Reward Prediction with Limited Labels, by Tianyang Luo and 6 other authors View PDF HTML (experimental) Abstract:Training large language models (LLMs) for complex reasoning via reinforcement learning requires reward labels that specify whether the generated rollouts are correct. However, obtaining reward labels at scale often requires expensive human labeling or time-consuming verification procedures; for instance, evaluating mathematical proofs demands expert review, while open-ended question answering lacks definitive ground truth. When reward labels are limited, the effectiveness of reinforcement learning fine-tuning is constrained by the scarcity of reward labels. We introduce MemReward, a graph-based experience memory framework: an initial LLM policy generates rollouts for each query, each comprising a thinking process and a final answer, and these rollouts are stored as experience memory. Queries, thinking processes, and answers form nodes in a heterogeneous graph with similarity and structural edges; a GNN trained on labeled nodes propagates rewards to unlabeled rollouts during online optimization. Experiments on Qwen2.5-3B and...

Originally published on March 23, 2026. Curated by AI News.

Llms

OpenClaw security checklist: practical safeguards for AI agents

Here is one of the better quality guides on the ensuring safety when deploying OpenClaw: https://chatgptguide.ai/openclaw-security-checkl...

Reddit - Artificial Intelligence · 1 min · about 6 hours ago

Llms

I let Gemini in Google Maps plan my day and it went surprisingly well | The Verge

Gemini in Google Maps is a surprisingly useful way to explore new territory.

The Verge - AI · 11 min · about 8 hours ago

Llms

The person who replaces you probably won't be AI. It'll be someone from the next department over who learned to use it - opinion/discussion

I'm a strategy person by background. Two years ago I'd write a recommendation and hand it to a product team. Now.. I describe what I want...

Reddit - Artificial Intelligence · 1 min · about 15 hours ago

Llms

Block Resets Management With AI As Cash App Adds Installment Transfers

Block (NYSE:XYZ) plans a permanent organizational overhaul that replaces many middle management roles with AI-driven models to create fla...

AI Tools & Products · 5 min · about 18 hours ago

[2603.19310] MemReward: Graph-Based Experience Memory for LLM Reward Prediction with Limited Labels

About this article

Related Articles

OpenClaw security checklist: practical safeguards for AI agents

I let Gemini in Google Maps plan my day and it went surprisingly well | The Verge

The person who replaces you probably won't be AI. It'll be someone from the next department over who learned to use it - opinion/discussion

Block Resets Management With AI As Cash App Adds Installment Transfers

No comments

Stay updated with AI News