[2604.02355] From Broad Exploration to Stable Synthesis: Entropy-Guided Optimization for Autoregressive Image Generation
About this article
Abstract page for arXiv paper 2604.02355: From Broad Exploration to Stable Synthesis: Entropy-Guided Optimization for Autoregressive Image Generation
Computer Science > Machine Learning arXiv:2604.02355 (cs) [Submitted on 12 Mar 2026] Title:From Broad Exploration to Stable Synthesis: Entropy-Guided Optimization for Autoregressive Image Generation Authors:Han Song, Yucheng Zhou, Jianbing Shen, Yu Cheng View a PDF of the paper titled From Broad Exploration to Stable Synthesis: Entropy-Guided Optimization for Autoregressive Image Generation, by Han Song and 3 other authors View PDF HTML (experimental) Abstract:Combining Chain-of-Thought (CoT) with Reinforcement Learning (RL) improves text-to-image (T2I) generation, yet the underlying interaction between CoT's exploration and RL's optimization remains unclear. We present a systematic entropy-based analysis that yields three key insights: (1) CoT expands the generative exploration space, while RL contracts it toward high-reward regions; (2) final reward is strongly negatively correlated with both the mean and variance of image-token entropy, highlighting the need to reduce uncertainty and instability; and (3) the entropy of the textual CoT directly governs downstream image quality, with lower-entropy CoTs leading to better generations. Motivated by these findings, we propose Entropy-Guided Group Relative Policy Optimization (EG-GRPO), a fine-tuning strategy that reallocates optimization budget by uncertainty: low-entropy tokens are excluded from reward-driven updates to preserve stability, while high-entropy tokens receive an entropy bonus that encourages structured explorat...