[2505.19862] REA-RL: Reflection-Aware Online Reinforcement Learning

[2505.19862] REA-RL: Reflection-Aware Online Reinforcement Learning for Efficient Reasoning

arXiv - Machine Learning March 02, 2026 4 min read

About this article

Abstract page for arXiv paper 2505.19862: REA-RL: Reflection-Aware Online Reinforcement Learning for Efficient Reasoning

Computer Science > Computation and Language arXiv:2505.19862 (cs) [Submitted on 26 May 2025 (v1), last revised 27 Feb 2026 (this version, v2)] Title:REA-RL: Reflection-Aware Online Reinforcement Learning for Efficient Reasoning Authors:Hexuan Deng, Wenxiang Jiao, Xuebo Liu, Jun Rao, Min Zhang View a PDF of the paper titled REA-RL: Reflection-Aware Online Reinforcement Learning for Efficient Reasoning, by Hexuan Deng and 4 other authors View PDF HTML (experimental) Abstract:Large Reasoning Models (LRMs) demonstrate strong performance in complex tasks but often face the challenge of overthinking, leading to substantially high inference costs. Existing approaches synthesize shorter reasoning responses for LRMs to learn, but are inefficient for online usage due to the time-consuming data generation and filtering processes. Meanwhile, online reinforcement learning mainly adopts a length reward to encourage short reasoning responses, but it tends to lose reflection ability and harm performance. To address these issues, we propose REA-RL, which introduces a small reflection model for efficient scaling in online training, offering both parallel sampling and sequential revision. Besides, a reflection reward is designed to further prevent LRMs from favoring short yet non-reflective responses. Experiments show that both methods maintain or enhance performance while significantly improving inference efficiency. Their combination achieves a good balance between performance and efficien...

Originally published on March 02, 2026. Curated by AI News.

Machine Learning

[R] Literature on optimizing user feedback in the form of Thumbs up/ Thumbs down?

I am working in a project where I have a dataset of model responses tagged with "thumbs up" or "thumbs down" by the user. That's all the ...

Reddit - Machine Learning · 1 min · 14 minutes ago

Machine Learning

Diffusion-based AI model successfully trained in electroplating

Electrochemical deposition, or electroplating, is a common industrial technique that coats materials to improve corrosion resistance and ...

Reddit - Artificial Intelligence · 1 min · 30 minutes ago

Machine Learning

AI model can detect multiple cognitive brain diseases from a single blood sample

The symptom profiles of different neurodegenerative diseases often overlap, and diagnosing age-related cognitive symptoms is complex. A p...

Reddit - Artificial Intelligence · 1 min · 30 minutes ago

Machine Learning

[P] Federated Adversarial Learning

I'm a CS/ML engineering student in my 4th year, and I need help for a project I recently got assigned to (as an "end of the year" project...

Reddit - Machine Learning · 1 min · about 3 hours ago

[2505.19862] REA-RL: Reflection-Aware Online Reinforcement Learning for Efficient Reasoning

About this article

Related Articles

[R] Literature on optimizing user feedback in the form of Thumbs up/ Thumbs down?

Diffusion-based AI model successfully trained in electroplating

AI model can detect multiple cognitive brain diseases from a single blood sample

[P] Federated Adversarial Learning

No comments

Stay updated with AI News