[2602.24286] CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation
About this article
Abstract page for arXiv paper 2602.24286: CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation
Computer Science > Machine Learning arXiv:2602.24286 (cs) [Submitted on 27 Feb 2026] Title:CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation Authors:Weinan Dai, Hanlin Wu, Qiying Yu, Huan-ang Gao, Jiahao Li, Chengquan Jiang, Weiqiang Lou, Yufan Song, Hongli Yu, Jiaze Chen, Wei-Ying Ma, Ya-Qin Zhang, Jingjing Liu, Mingxuan Wang, Xin Liu, Hao Zhou View a PDF of the paper titled CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation, by Weinan Dai and 15 other authors View PDF HTML (experimental) Abstract:GPU kernel optimization is fundamental to modern deep learning but remains a highly specialized task requiring deep hardware expertise. Despite strong performance in general programming, large language models (LLMs) remain uncompetitive with compiler-based systems such as this http URL for CUDA kernel generation. Existing CUDA code generation approaches either rely on training-free refinement or fine-tune models within fixed multi-turn execution-feedback loops, but both paradigms fail to fundamentally improve the model's intrinsic CUDA optimization ability, resulting in limited performance gains. We present CUDA Agent, a large-scale agentic reinforcement learning system that develops CUDA kernel expertise through three components: a scalable data synthesis pipeline, a skill-augmented CUDA development environment with automated verification and profiling to provide reliable reward signals, and reinforcement learning algorit...