[2603.04597] Bootstrapping Exploration with Group-Level Natural Language Feedback in Reinforcement Learning
About this article
Abstract page for arXiv paper 2603.04597: Bootstrapping Exploration with Group-Level Natural Language Feedback in Reinforcement Learning
Computer Science > Computation and Language arXiv:2603.04597 (cs) [Submitted on 4 Mar 2026] Title:Bootstrapping Exploration with Group-Level Natural Language Feedback in Reinforcement Learning Authors:Lei Huang, Xiang Cheng, Chenxiao Zhao, Guobin Shen, Junjie Yang, Xiaocheng Feng, Yuxuan Gu, Xing Yu, Bing Qin View a PDF of the paper titled Bootstrapping Exploration with Group-Level Natural Language Feedback in Reinforcement Learning, by Lei Huang and 8 other authors View PDF Abstract:Large language models (LLMs) typically receive diverse natural language (NL) feedback through interaction with the environment. However, current reinforcement learning (RL) algorithms rely solely on scalar rewards, leaving the rich information in NL feedback underutilized and leading to inefficient exploration. In this work, we propose GOLF, an RL framework that explicitly exploits group-level language feedback to guide targeted exploration through actionable refinements. GOLF aggregates two complementary feedback sources: (i) external critiques that pinpoint errors or propose targeted fixes, and (ii) intra-group attempts that supply alternative partial ideas and diverse failure patterns. These group-level feedbacks are aggregated to produce high-quality refinements, which are adaptively injected into training as off-policy scaffolds to provide targeted guidance in sparse-reward regions. Meanwhile, GOLF jointly optimizes generation and refinement within a unified RL loop, creating a virtuous c...