[2510.06499] Webscale-RL: Automated Data Pipeline for Scaling RL Data to Pretraining Levels
About this article
Abstract page for arXiv paper 2510.06499: Webscale-RL: Automated Data Pipeline for Scaling RL Data to Pretraining Levels
Computer Science > Computation and Language arXiv:2510.06499 (cs) [Submitted on 7 Oct 2025 (v1), last revised 10 Apr 2026 (this version, v2)] Title:Webscale-RL: Automated Data Pipeline for Scaling RL Data to Pretraining Levels Authors:Zhepeng Cen, Haolin Chen, Shiyu Wang, Zuxin Liu, Zhiwei Liu, Jielin Qiu, Ding Zhao, Silvio Savarese, Caiming Xiong, Huan Wang, Weiran Yao View a PDF of the paper titled Webscale-RL: Automated Data Pipeline for Scaling RL Data to Pretraining Levels, by Zhepeng Cen and 10 other authors View PDF HTML (experimental) Abstract:Large Language Models (LLMs) have achieved remarkable success through imitation learning on vast text corpora, but this paradigm creates a training-generation gap and limits robust reasoning. Reinforcement learning (RL) offers a more data-efficient solution capable of bridging this gap, yet its application has been constrained by a critical data bottleneck: existing RL datasets are orders of magnitude smaller and less diverse than web-scale pre-training corpora. To address this, we introduce the Webscale-RL pipeline, a scalable data engine that systematically converts large-scale pre-training documents into millions of diverse, verifiable question-answer pairs for RL. Using this pipeline, we construct the Webscale-RL dataset, containing 1.2 million examples across more than 9 domains. Our experiments show that the model trained on this dataset significantly outperforms continual pretraining and strong data refinement baseline...