[2602.17684] CodeScaler: Scaling Code LLM Training and Test-Time Inference via Execution-Free Reward Models

[2602.17684] CodeScaler: Scaling Code LLM Training and Test-Time Inference via Execution-Free Reward Models

arXiv - Machine Learning 4 min read Article

Summary

The paper presents CodeScaler, an execution-free reward model that enhances the scalability of code LLM training and test-time inference, outperforming traditional methods.

Why It Matters

As code generation becomes increasingly vital in software development, improving the efficiency and effectiveness of training models is crucial. CodeScaler addresses scalability issues in reinforcement learning by eliminating the dependency on execution-based feedback, which can be unreliable and limited. This advancement could lead to more robust AI systems capable of generating code with higher accuracy and reduced latency.

Key Takeaways

  • CodeScaler improves code LLM performance by an average of +11.72 points across benchmarks.
  • It enables scalable reinforcement learning without the need for test cases.
  • The model achieves a 10-fold reduction in latency compared to traditional unit test approaches.
  • CodeScaler surpasses existing reward models in both code and general reasoning tasks.
  • It utilizes syntax-aware code extraction for stable optimization.

Computer Science > Machine Learning arXiv:2602.17684 (cs) [Submitted on 4 Feb 2026] Title:CodeScaler: Scaling Code LLM Training and Test-Time Inference via Execution-Free Reward Models Authors:Xiao Zhu, Xinyu Zhou, Boyu Zhu, Hanxu Hu, Mingzhe Du, Haotian Zhang, Huiming Wang, Zhijiang Guo View a PDF of the paper titled CodeScaler: Scaling Code LLM Training and Test-Time Inference via Execution-Free Reward Models, by Xiao Zhu and 7 other authors View PDF HTML (experimental) Abstract:Reinforcement Learning from Verifiable Rewards (RLVR) has driven recent progress in code large language models by leveraging execution-based feedback from unit tests, but its scalability is fundamentally constrained by the availability and reliability of high-quality test cases. We propose CodeScaler, an execution-free reward model designed to scale both reinforcement learning training and test-time inference for code generation. CodeScaler is trained on carefully curated preference data derived from verified code problems and incorporates syntax-aware code extraction and validity-preserving reward shaping to ensure stable and robust optimization. Across five coding benchmarks, CodeScaler improves Qwen3-8B-Base by an average of +11.72 points, outperforming binary execution-based RL by +1.82 points, and enables scalable reinforcement learning on synthetic datasets without any test cases. At inference time, CodeScaler serves as an effective test-time scaling method, achieving performance comparable...

Related Articles

Llms

What if Claude purposefully made its own code leakable so that it would get leaked

What if Claude leaked itself by socially and architecturally engineering itself to be leaked by a dumb human submitted by /u/smurfcsgoawp...

Reddit - Artificial Intelligence · 1 min ·
Llms

Observer-Embedded Reality

Observer-Embedded Reality Consciousness, Complexity, Meaning, and the Limits of Human Knowledge A Conceptual Philosophy-of-Science Paper ...

Reddit - Artificial Intelligence · 1 min ·
Llms

I think we’re about to have a new kind of “SEO”… and nobody is talking about it.

More people are asking ChatGPT things like: “what’s the best CRM?” “is this tool worth it?” “alternatives to X” And they just… trust the ...

Reddit - Artificial Intelligence · 1 min ·
Llms

Why would Claude give me the same response over and over and give others different replies?

I asked Claude to "generate me a random word" so I could do some word play. Then I asked it again in a new prompt window on desktop after...

Reddit - Artificial Intelligence · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime