[2603.25184] Train at Moving Edge: Online-Verified Prompt Selection for Efficient RL Training of Large Reasoning Model
About this article
Abstract page for arXiv paper 2603.25184: Train at Moving Edge: Online-Verified Prompt Selection for Efficient RL Training of Large Reasoning Model
Computer Science > Machine Learning arXiv:2603.25184 (cs) [Submitted on 26 Mar 2026] Title:Train at Moving Edge: Online-Verified Prompt Selection for Efficient RL Training of Large Reasoning Model Authors:Jiahao Wu, Ning Lu, Shengcai Liu, Kun Wang, Yanting Yang, Li Qing, Ke Tang View a PDF of the paper titled Train at Moving Edge: Online-Verified Prompt Selection for Efficient RL Training of Large Reasoning Model, by Jiahao Wu and 6 other authors View PDF HTML (experimental) Abstract:Reinforcement learning (RL) has become essential for post-training large language models (LLMs) in reasoning tasks. While scaling rollouts can stabilize training and enhance performance, the computational overhead is a critical issue. In algorithms like GRPO, multiple rollouts per prompt incur prohibitive costs, as a large portion of prompts provide negligible gradients and are thus of low utility. To address this problem, we investigate how to select high-utility prompts before the rollout phase. Our experimental analysis reveals that sample utility is non-uniform and evolving: the strongest learning signals concentrate at the ``learning edge", the intersection of intermediate difficulty and high uncertainty, which shifts as training proceeds. Motivated by this, we propose HIVE (History-Informed and online-VErified prompt selection), a dual-stage framework for data-efficient RL. HIVE utilizes historical reward trajectories for coarse selection and employs prompt entropy as a real-time proxy t...