[2505.19862] REA-RL: Reflection-Aware Online Reinforcement Learning for Efficient Reasoning
About this article
Abstract page for arXiv paper 2505.19862: REA-RL: Reflection-Aware Online Reinforcement Learning for Efficient Reasoning
Computer Science > Computation and Language arXiv:2505.19862 (cs) [Submitted on 26 May 2025 (v1), last revised 27 Feb 2026 (this version, v2)] Title:REA-RL: Reflection-Aware Online Reinforcement Learning for Efficient Reasoning Authors:Hexuan Deng, Wenxiang Jiao, Xuebo Liu, Jun Rao, Min Zhang View a PDF of the paper titled REA-RL: Reflection-Aware Online Reinforcement Learning for Efficient Reasoning, by Hexuan Deng and 4 other authors View PDF HTML (experimental) Abstract:Large Reasoning Models (LRMs) demonstrate strong performance in complex tasks but often face the challenge of overthinking, leading to substantially high inference costs. Existing approaches synthesize shorter reasoning responses for LRMs to learn, but are inefficient for online usage due to the time-consuming data generation and filtering processes. Meanwhile, online reinforcement learning mainly adopts a length reward to encourage short reasoning responses, but it tends to lose reflection ability and harm performance. To address these issues, we propose REA-RL, which introduces a small reflection model for efficient scaling in online training, offering both parallel sampling and sequential revision. Besides, a reflection reward is designed to further prevent LRMs from favoring short yet non-reflective responses. Experiments show that both methods maintain or enhance performance while significantly improving inference efficiency. Their combination achieves a good balance between performance and efficien...