[2603.29292] Self-Improving Code Generation via Semantic Entropy and

[2603.29292] Self-Improving Code Generation via Semantic Entropy and Behavioral Consensus

arXiv - AI April 01, 2026 4 min read

About this article

Abstract page for arXiv paper 2603.29292: Self-Improving Code Generation via Semantic Entropy and Behavioral Consensus

Computer Science > Software Engineering arXiv:2603.29292 (cs) [Submitted on 31 Mar 2026] Title:Self-Improving Code Generation via Semantic Entropy and Behavioral Consensus Authors:Huan Zhang, Wei Cheng, Wei Hu View a PDF of the paper titled Self-Improving Code Generation via Semantic Entropy and Behavioral Consensus, by Huan Zhang and Wei Cheng and Wei Hu View PDF HTML (experimental) Abstract:Improving the code generation capabilities of large language models (LLMs) typically relies on supervised fine-tuning or preference optimization, both of which require costly external resources such as powerful teacher models or reliable test units. However, in real-world scenarios, it is much harder to obtain reference solutions and test oracles than problem descriptions and test inputs. In this paper, we tackle a challenging yet realistic question: Can a code language model improve itself without access to a superior teacher and a test oracle? To answer this, we propose ConSelf, a self-improving approach built upon two key ideas. First, we introduce code semantic entropy, a novel metric that measures problem-level uncertainty by assessing the functional diversity of program behaviors, enabling a curriculum construction with the most learnable problems. Second, we present consensus-driven direct preference optimization (Con-DPO), a preference-based fine-tuning method that weights each preference pair by its behavioral consensus, thereby mitigating the impact of noisy self-generated s...

Originally published on April 01, 2026. Curated by AI News.

Llms

GPT-4 vs Claude vs Gemini for coding — honest breakdown after 3 months of daily use

I am a solo developer who has been using all three seriously. Here is what I actually think: GPT-4o — Strengths: Large context window, st...

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

Llms

You're giving feedback on a new version of ChatGPT

So I will be paying attention to these system messages more now- the last time I got one of these not so long back the 'tone' changed to ...

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

Llms

Gemma 4 actually running usable on an Android phone (not llama.cpp)

I wanted a real local assistant on my phone, not a demo. First tried the usual llama.cpp in Termux — Gemma 4 was 2–3 tok/s and the phone ...

Reddit - Artificial Intelligence · 1 min · about 5 hours ago

Llms

Claude vs Gemini: Solving the laden knight's tour problem

AI Coding contest day 8 The eighth challenge is a weighted variant of the classic knight's tour. The knight must visit every square of a ...

Reddit - Artificial Intelligence · 1 min · about 5 hours ago

[2603.29292] Self-Improving Code Generation via Semantic Entropy and Behavioral Consensus

About this article

Related Articles

GPT-4 vs Claude vs Gemini for coding — honest breakdown after 3 months of daily use

You're giving feedback on a new version of ChatGPT

Gemma 4 actually running usable on an Android phone (not llama.cpp)

Claude vs Gemini: Solving the laden knight's tour problem

No comments

Stay updated with AI News