[2603.05863] ReflexiCoder: Teaching Large Language Models to

[2603.05863] ReflexiCoder: Teaching Large Language Models to Self-Reflect on Generated Code and Self-Correct It via Reinforcement Learning

arXiv - Machine Learning April 21, 2026 4 min read

About this article

Abstract page for arXiv paper 2603.05863: ReflexiCoder: Teaching Large Language Models to Self-Reflect on Generated Code and Self-Correct It via Reinforcement Learning

Computer Science > Computation and Language arXiv:2603.05863 (cs) [Submitted on 6 Mar 2026 (v1), last revised 20 Apr 2026 (this version, v2)] Title:ReflexiCoder: Teaching Large Language Models to Self-Reflect on Generated Code and Self-Correct It via Reinforcement Learning Authors:Juyong Jiang, Jiasi Shen, Sunghun Kim, Kang Min Yoo, Jeonghoon Kim, Sungju Kim View a PDF of the paper titled ReflexiCoder: Teaching Large Language Models to Self-Reflect on Generated Code and Self-Correct It via Reinforcement Learning, by Juyong Jiang and 5 other authors View PDF HTML (experimental) Abstract:While Large Language Models (LLMs) have revolutionized code generation, standard ``System 1'' approaches that generate solutions in a single forward pass often hit a performance ceiling on complex algorithmic tasks. Existing iterative refinement strategies attempt to bridge this gap at inference time, yet they predominantly rely on external oracles, execution feedback, or computationally expensive prompt-response cycles. In this work, we propose ReflexiCoder, a novel reinforcement learning (RL) framework that internalizes the structured reasoning trajectory, encompassing initial generation, bug and optimization aware reflection, and self-correction, directly into the model's weights. Unlike prior methods, ReflexiCoder shifts the paradigm from external-dependent refinement to an intrinsic, fully autonomous self-reflection and self-correction capabilities at inference time. We utilize an RL-on...

Originally published on April 21, 2026. Curated by AI News.

Llms

Project Idea. Dream display project. 3 LLMs spitball the idea and tech specs and programs needed.

submitted by /u/Ok_Nectarine_4445 [link] [comments]

Reddit - Artificial Intelligence · 1 min · about 2 hours ago

Llms

[2604.07562] Reasoning-Based Refinement of Unsupervised Text Clusters with LLMs

Abstract page for arXiv paper 2604.07562: Reasoning-Based Refinement of Unsupervised Text Clusters with LLMs

arXiv - Machine Learning · 4 min · about 4 hours ago

Llms

[2604.07484] ConsistRM: Improving Generative Reward Models via Consistency-Aware Self-Training

Abstract page for arXiv paper 2604.07484: ConsistRM: Improving Generative Reward Models via Consistency-Aware Self-Training

arXiv - Machine Learning · 4 min · about 4 hours ago

Llms

[2601.21278] GeoRC: A Benchmark for Geolocation Reasoning Chains

Abstract page for arXiv paper 2601.21278: GeoRC: A Benchmark for Geolocation Reasoning Chains

arXiv - Machine Learning · 4 min · about 4 hours ago

[2603.05863] ReflexiCoder: Teaching Large Language Models to Self-Reflect on Generated Code and Self-Correct It via Reinforcement Learning

About this article

Related Articles

Project Idea. Dream display project. 3 LLMs spitball the idea and tech specs and programs needed.

[2604.07562] Reasoning-Based Refinement of Unsupervised Text Clusters with LLMs

[2604.07484] ConsistRM: Improving Generative Reward Models via Consistency-Aware Self-Training

[2601.21278] GeoRC: A Benchmark for Geolocation Reasoning Chains

No comments

Stay updated with AI News