[2603.29292] Self-Improving Code Generation via Semantic Entropy and Behavioral Consensus
About this article
Abstract page for arXiv paper 2603.29292: Self-Improving Code Generation via Semantic Entropy and Behavioral Consensus
Computer Science > Software Engineering arXiv:2603.29292 (cs) [Submitted on 31 Mar 2026] Title:Self-Improving Code Generation via Semantic Entropy and Behavioral Consensus Authors:Huan Zhang, Wei Cheng, Wei Hu View a PDF of the paper titled Self-Improving Code Generation via Semantic Entropy and Behavioral Consensus, by Huan Zhang and Wei Cheng and Wei Hu View PDF HTML (experimental) Abstract:Improving the code generation capabilities of large language models (LLMs) typically relies on supervised fine-tuning or preference optimization, both of which require costly external resources such as powerful teacher models or reliable test units. However, in real-world scenarios, it is much harder to obtain reference solutions and test oracles than problem descriptions and test inputs. In this paper, we tackle a challenging yet realistic question: Can a code language model improve itself without access to a superior teacher and a test oracle? To answer this, we propose ConSelf, a self-improving approach built upon two key ideas. First, we introduce code semantic entropy, a novel metric that measures problem-level uncertainty by assessing the functional diversity of program behaviors, enabling a curriculum construction with the most learnable problems. Second, we present consensus-driven direct preference optimization (Con-DPO), a preference-based fine-tuning method that weights each preference pair by its behavioral consensus, thereby mitigating the impact of noisy self-generated s...