Keep the Tokens Flowing: Lessons from 16 Open-Source RL Libraries
About this article
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
Back to Articles Keep the Tokens Flowing: Lessons from 16 Open-Source RL Libraries Published March 10, 2026 Update on GitHub Upvote 68 +62 Amine Dirhoussi aminediroHF Follow Quentin Gallouédec qgallouedec Follow Kashif Rasul kashif Follow Lewis Tunstall lewtun Follow Edward Beeching edbeeching Follow Albert Villanova del Moral albertvillanova Follow Nouamane Tazi nouamanetazi Follow Leandro von Werra lvwerra Follow Sergio Paniego sergiopaniego Follow TL;DR -- For those of you who don't have time to read 5,000 words about async RL plumbing (we get it, you have models to train): The problem: In synchronous RL (reinforcement learning) training, data generation (model inference to create data samples) dominates wall-clock time -- a single batch of 32K-token rollouts on a 32B (32-billion parameter) model can take hours, while the GPUs used for training remain idle. The solution everyone converged on: Disaggregate (separate) inference and training onto different GPU pools, connect them with a rollout buffer (temporary storage for model outputs), and transfer weights asynchronously (without waiting), so neither side waits for the other. We surveyed 16 open-source libraries that implement this pattern and compared them across 7 axes: orchestration primitives, buffer design, weight sync protocols, staleness management, partial rollout handling, LoRA support, and distributed training backends. Key findings: Ray dominates orchestration (8/16 surveyed distributed computing libraries)....