[2511.16665] Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter
About this article
Abstract page for arXiv paper 2511.16665: Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter
Computer Science > Machine Learning arXiv:2511.16665 (cs) [Submitted on 20 Nov 2025 (v1), last revised 20 Mar 2026 (this version, v3)] Title:Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter Authors:Qinghao Hu, Shang Yang, Junxian Guo, Xiaozhe Yao, Yujun Lin, Yuxian Gu, Han Cai, Chuang Gan, Ana Klimovic, Song Han View a PDF of the paper titled Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter, by Qinghao Hu and 9 other authors View PDF HTML (experimental) Abstract:The emergence of Large Language Models (LLMs) with strong reasoning capabilities marks a significant milestone, unlocking new frontiers in complex problem-solving. However, training these reasoning models, typically using Reinforcement Learning (RL), encounters critical efficiency bottlenecks: response generation during RL training exhibits a persistent long-tail distribution, where a few very long responses dominate execution time, wasting resources and inflating costs. To address this, we propose TLT, a system that accelerates reasoning RL training losslessly by integrating adaptive speculative decoding. Applying speculative decoding in RL is challenging due to the dynamic workloads, evolving target model, and draft model training overhead. TLT overcomes these obstacles with two synergistic components: (1) Adaptive Drafter, a lightweight draft model trained continuously on idle GPUs during long-tail generation to maintain alignment with the target model at no ...