[2510.19225] RLBoost: Harvesting Preemptible Resources for Cost-Efficient Reinforcement Learning on LLMs
About this article
Abstract page for arXiv paper 2510.19225: RLBoost: Harvesting Preemptible Resources for Cost-Efficient Reinforcement Learning on LLMs
Computer Science > Distributed, Parallel, and Cluster Computing arXiv:2510.19225 (cs) [Submitted on 22 Oct 2025 (v1), last revised 8 Apr 2026 (this version, v3)] Title:RLBoost: Harvesting Preemptible Resources for Cost-Efficient Reinforcement Learning on LLMs Authors:Yongji Wu, Xueshen Liu, Haizhong Zheng, Juncheng Gu, Beidi Chen, Z. Morley Mao, Arvind Krishnamurthy, Ion Stoica View a PDF of the paper titled RLBoost: Harvesting Preemptible Resources for Cost-Efficient Reinforcement Learning on LLMs, by Yongji Wu and 7 other authors View PDF HTML (experimental) Abstract:Reinforcement learning (RL) has become essential for unlocking advanced reasoning capabilities in large language models (LLMs). RL workflows involve interleaving rollout and training stages with fundamentally different resource requirements. Rollout typically dominates overall execution time, yet scales efficiently through multiple independent instances. In contrast, training requires tightly-coupled GPUs with full-mesh communication. Existing RL frameworks fall into two categories: co-located and disaggregated architectures. Co-located frameworks fail to address this resource tension by forcing both stages to share the same GPUs. Disaggregated architectures, without modifications of well-established RL algorithms, suffer from resource under-utilization. Meanwhile, preemptible GPU resources, i.e., spot instances on public clouds and spare capacity in production clusters, present significant cost-saving oppor...