[2603.19310] MemReward: Graph-Based Experience Memory for LLM Reward Prediction with Limited Labels
About this article
Abstract page for arXiv paper 2603.19310: MemReward: Graph-Based Experience Memory for LLM Reward Prediction with Limited Labels
Computer Science > Machine Learning arXiv:2603.19310 (cs) [Submitted on 13 Mar 2026] Title:MemReward: Graph-Based Experience Memory for LLM Reward Prediction with Limited Labels Authors:Tianyang Luo, Tao Feng, Zhigang Hua, Yan Xie, Shuang Yang, Ge Liu, Jiaxuan You View a PDF of the paper titled MemReward: Graph-Based Experience Memory for LLM Reward Prediction with Limited Labels, by Tianyang Luo and 6 other authors View PDF HTML (experimental) Abstract:Training large language models (LLMs) for complex reasoning via reinforcement learning requires reward labels that specify whether the generated rollouts are correct. However, obtaining reward labels at scale often requires expensive human labeling or time-consuming verification procedures; for instance, evaluating mathematical proofs demands expert review, while open-ended question answering lacks definitive ground truth. When reward labels are limited, the effectiveness of reinforcement learning fine-tuning is constrained by the scarcity of reward labels. We introduce MemReward, a graph-based experience memory framework: an initial LLM policy generates rollouts for each query, each comprising a thinking process and a final answer, and these rollouts are stored as experience memory. Queries, thinking processes, and answers form nodes in a heterogeneous graph with similarity and structural edges; a GNN trained on labeled nodes propagates rewards to unlabeled rollouts during online optimization. Experiments on Qwen2.5-3B and...