[2603.23414] SortedRL: Accelerating RL Training for LLMs through Online Length-Aware Scheduling
About this article
Abstract page for arXiv paper 2603.23414: SortedRL: Accelerating RL Training for LLMs through Online Length-Aware Scheduling
Computer Science > Machine Learning arXiv:2603.23414 (cs) [Submitted on 24 Mar 2026] Title:SortedRL: Accelerating RL Training for LLMs through Online Length-Aware Scheduling Authors:Yiqi Zhang, Huiqiang Jiang, Xufang Luo, Zhihe Yang, Chengruidong Zhang, Yifei Shen, Dongsheng Li, Yuqing Yang, Lili Qiu, Yang You View a PDF of the paper titled SortedRL: Accelerating RL Training for LLMs through Online Length-Aware Scheduling, by Yiqi Zhang and 9 other authors View PDF HTML (experimental) Abstract:Scaling reinforcement learning (RL) has shown strong promise for enhancing the reasoning abilities of large language models (LLMs), particularly in tasks requiring long chain-of-thought generation. However, RL training efficiency is often bottlenecked by the rollout phase, which can account for up to 70% of total training time when generating long trajectories (e.g., 16k tokens), due to slow autoregressive generation and synchronization overhead between rollout and policy updates. We propose SortedRL, an online length-aware scheduling strategy designed to address this bottleneck by improving rollout efficiency and maintaining training stability. SortedRL reorders rollout samples based on output lengths, prioritizing short samples forming groups for early updates. This enables large rollout batches, flexible update batches, and near on-policy micro-curriculum construction simultaneously. To further accelerate the pipeline, SortedRL incorporates a mechanism to control the degree of off...