[2603.05863] ReflexiCoder: Teaching Large Language Models to Self-Reflect on Generated Code and Self-Correct It via Reinforcement Learning

[2603.05863] ReflexiCoder: Teaching Large Language Models to Self-Reflect on Generated Code and Self-Correct It via Reinforcement Learning

arXiv - Machine Learning 4 min read

About this article

Abstract page for arXiv paper 2603.05863: ReflexiCoder: Teaching Large Language Models to Self-Reflect on Generated Code and Self-Correct It via Reinforcement Learning

Computer Science > Computation and Language arXiv:2603.05863 (cs) [Submitted on 6 Mar 2026 (v1), last revised 20 Apr 2026 (this version, v2)] Title:ReflexiCoder: Teaching Large Language Models to Self-Reflect on Generated Code and Self-Correct It via Reinforcement Learning Authors:Juyong Jiang, Jiasi Shen, Sunghun Kim, Kang Min Yoo, Jeonghoon Kim, Sungju Kim View a PDF of the paper titled ReflexiCoder: Teaching Large Language Models to Self-Reflect on Generated Code and Self-Correct It via Reinforcement Learning, by Juyong Jiang and 5 other authors View PDF HTML (experimental) Abstract:While Large Language Models (LLMs) have revolutionized code generation, standard ``System 1'' approaches that generate solutions in a single forward pass often hit a performance ceiling on complex algorithmic tasks. Existing iterative refinement strategies attempt to bridge this gap at inference time, yet they predominantly rely on external oracles, execution feedback, or computationally expensive prompt-response cycles. In this work, we propose ReflexiCoder, a novel reinforcement learning (RL) framework that internalizes the structured reasoning trajectory, encompassing initial generation, bug and optimization aware reflection, and self-correction, directly into the model's weights. Unlike prior methods, ReflexiCoder shifts the paradigm from external-dependent refinement to an intrinsic, fully autonomous self-reflection and self-correction capabilities at inference time. We utilize an RL-on...

Originally published on April 21, 2026. Curated by AI News.

Related Articles

Llms

Project Idea. Dream display project. 3 LLMs spitball the idea and tech specs and programs needed.

submitted by /u/Ok_Nectarine_4445 [link] [comments]

Reddit - Artificial Intelligence · 1 min ·
[2604.07562] Reasoning-Based Refinement of Unsupervised Text Clusters with LLMs
Llms

[2604.07562] Reasoning-Based Refinement of Unsupervised Text Clusters with LLMs

Abstract page for arXiv paper 2604.07562: Reasoning-Based Refinement of Unsupervised Text Clusters with LLMs

arXiv - Machine Learning · 4 min ·
[2604.07484] ConsistRM: Improving Generative Reward Models via Consistency-Aware Self-Training
Llms

[2604.07484] ConsistRM: Improving Generative Reward Models via Consistency-Aware Self-Training

Abstract page for arXiv paper 2604.07484: ConsistRM: Improving Generative Reward Models via Consistency-Aware Self-Training

arXiv - Machine Learning · 4 min ·
[2601.21278] GeoRC: A Benchmark for Geolocation Reasoning Chains
Llms

[2601.21278] GeoRC: A Benchmark for Geolocation Reasoning Chains

Abstract page for arXiv paper 2601.21278: GeoRC: A Benchmark for Geolocation Reasoning Chains

arXiv - Machine Learning · 4 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime