[2602.23407] Learning to Generate Secure Code via Token-Level Rewards
About this article
Abstract page for arXiv paper 2602.23407: Learning to Generate Secure Code via Token-Level Rewards
Computer Science > Cryptography and Security arXiv:2602.23407 (cs) [Submitted on 26 Feb 2026] Title:Learning to Generate Secure Code via Token-Level Rewards Authors:Jiazheng Quan, Xiaodong Li, Bin Wang, Guo An, Like Liu, Degen Huang, Lin Liu, Chengbin Hou View a PDF of the paper titled Learning to Generate Secure Code via Token-Level Rewards, by Jiazheng Quan and 7 other authors View PDF HTML (experimental) Abstract:Large language models (LLMs) have demonstrated strong capabilities in code generation, yet they remain prone to producing security vulnerabilities. Existing approaches commonly suffer from two key limitations: the scarcity of high-quality security data and coarse-grained reinforcement learning reward signals. To address these challenges, we propose Vul2Safe, a new secure code generation framework that leverages LLM self-reflection to construct high-confidence repair pairs from real-world vulnerabilities, and further generates diverse implicit prompts to build the PrimeVul+ dataset. Meanwhile, we introduce SRCode, a novel training framework that pioneers the use of token-level rewards in reinforcement learning for code security, which enables the model to continuously attend to and reinforce critical fine-grained security patterns during training. Compared with traditional instance-level reward schemes, our approach allows for more precise optimization of local security implementations. Extensive experiments show that PrimeVul+ and SRCode substantially reduce se...