[2603.24202] A Deep Dive into Scaling RL for Code Generation with Synthetic Data and Curricula
About this article
Abstract page for arXiv paper 2603.24202: A Deep Dive into Scaling RL for Code Generation with Synthetic Data and Curricula
Computer Science > Machine Learning arXiv:2603.24202 (cs) [Submitted on 25 Mar 2026] Title:A Deep Dive into Scaling RL for Code Generation with Synthetic Data and Curricula Authors:Cansu Sancaktar, David Zhang, Gabriel Synnaeve, Taco Cohen View a PDF of the paper titled A Deep Dive into Scaling RL for Code Generation with Synthetic Data and Curricula, by Cansu Sancaktar and 3 other authors View PDF HTML (experimental) Abstract:Reinforcement learning (RL) has emerged as a powerful paradigm for improving large language models beyond supervised fine-tuning, yet sustaining performance gains at scale remains an open challenge, as data diversity and structure, rather than volume alone, become the limiting factor. We address this by introducing a scalable multi-turn synthetic data generation pipeline in which a teacher model iteratively refines problems based on in-context student performance summaries, producing structured difficulty progressions without any teacher fine-tuning. Compared to single-turn generation, this multi-turn approach substantially improves the yield of valid synthetic problems and naturally produces stepping stones, i.e. easier and harder variants of the same core task, that support curriculum-based training. We systematically study how task difficulty, curriculum scheduling, and environment diversity interact during RL training across the Llama3.1-8B Instruct and Qwen3-8B Base model families, with additional scaling experiments on Qwen2.5-32B. Our results ...