[2505.21668] R1-Code-Interpreter: LLMs Reason with Code via Supervised and Multi-stage Reinforcement Learning
About this article
Abstract page for arXiv paper 2505.21668: R1-Code-Interpreter: LLMs Reason with Code via Supervised and Multi-stage Reinforcement Learning
Computer Science > Artificial Intelligence arXiv:2505.21668 (cs) [Submitted on 27 May 2025 (v1), last revised 3 Mar 2026 (this version, v3)] Title:R1-Code-Interpreter: LLMs Reason with Code via Supervised and Multi-stage Reinforcement Learning Authors:Yongchao Chen, Yueying Liu, Junwei Zhou, Yilun Hao, Jingquan Wang, Yang Zhang, Na Li, Chuchu Fan View a PDF of the paper titled R1-Code-Interpreter: LLMs Reason with Code via Supervised and Multi-stage Reinforcement Learning, by Yongchao Chen and 7 other authors View PDF HTML (experimental) Abstract:Practical guidance on training Large Language Models (LLMs) to leverage Code Interpreter across diverse tasks remains lacking. We present R1-Code-Interpreter, an extension of a text-only LLM trained via multi-turn supervised fine-tuning (SFT) and reinforcement learning (RL) to autonomously generate multiple code queries during step-by-step reasoning. Unlike prior RL + tool-use efforts focused on narrow domains such as math or retrieval, we curate 144 diverse reasoning and planning tasks and show that training a general-purpose Code Interpreter across them presents significant challenges due to task heterogeneity and scarcity of effective samples. To address this, we introduce a multi-stage curriculum learning approach that partitions training samples by measured improvement potential. The RL training prioritizes samples with higher potential and gradually shifts to lower-potential ones, increasing the average RL gains from merely ...