[2603.24965] Self-Corrected Image Generation with Explainable Latent Rewards
About this article
Abstract page for arXiv paper 2603.24965: Self-Corrected Image Generation with Explainable Latent Rewards
Computer Science > Computer Vision and Pattern Recognition arXiv:2603.24965 (cs) [Submitted on 26 Mar 2026] Title:Self-Corrected Image Generation with Explainable Latent Rewards Authors:Yinyi Luo, Hrishikesh Gokhale, Marios Savvides, Jindong Wang, Shengfeng He View a PDF of the paper titled Self-Corrected Image Generation with Explainable Latent Rewards, by Yinyi Luo and 4 other authors View PDF HTML (experimental) Abstract:Despite significant progress in text-to-image generation, aligning outputs with complex prompts remains challenging, particularly for fine-grained semantics and spatial relations. This difficulty stems from the feed-forward nature of generation, which requires anticipating alignment without fully understanding the output. In contrast, evaluating generated images is more tractable. Motivated by this asymmetry, we propose xLARD, a self-correcting framework that uses multimodal large language models to guide generation through Explainable LAtent RewarDs. xLARD introduces a lightweight corrector that refines latent representations based on structured feedback from model-generated references. A key component is a differentiable mapping from latent edits to interpretable reward signals, enabling continuous latent-level guidance from non-differentiable image-level evaluations. This mechanism allows the model to understand, assess, and correct itself during generation. Experiments across diverse generation and editing tasks show that xLARD improves semantic ali...