[2603.24093] Towards Effective Experiential Learning: Dual Guidance for Utilization and Internalization
About this article
Abstract page for arXiv paper 2603.24093: Towards Effective Experiential Learning: Dual Guidance for Utilization and Internalization
Computer Science > Machine Learning arXiv:2603.24093 (cs) [Submitted on 25 Mar 2026] Title:Towards Effective Experiential Learning: Dual Guidance for Utilization and Internalization Authors:Fei Bai, Zhipeng Chen, Chuan Hao, Ming Yang, Ran Tao, Bryan Dai, Wayne Xin Zhao, Jian Yang, Hongteng Xu View a PDF of the paper titled Towards Effective Experiential Learning: Dual Guidance for Utilization and Internalization, by Fei Bai and 8 other authors View PDF Abstract:Recently, reinforcement learning~(RL) has become an important approach for improving the capabilities of large language models~(LLMs). In particular, reinforcement learning from verifiable rewards~(RLVR) has emerged as a promising paradigm for reasoning tasks. However, existing RL-based training still remains only a rough approximation to human learning. Human learners leverage both external and internal experience to guide exploration and gradually internalize useful trajectories into stable knowledge. Motivated by this gap, we ask: how can LLMs better utilize and internalize experience during RLVR training? To answer this question, we propose \textbf{D}ual \textbf{G}uidance \textbf{O}ptimization~(\textbf{DGO}), a unified framework that leverages \emph{external} and \emph{internal experience} to improve training effectiveness. Specifically, DGO first constructs an experience bank from previously explored trajectories. The policy then performs exploration under the joint guidance of the experience bank and the model...