[2603.02829] Toward Early Quality Assessment of Text-to-Image Diffusion Models
About this article
Abstract page for arXiv paper 2603.02829: Toward Early Quality Assessment of Text-to-Image Diffusion Models
Computer Science > Computer Vision and Pattern Recognition arXiv:2603.02829 (cs) [Submitted on 3 Mar 2026] Title:Toward Early Quality Assessment of Text-to-Image Diffusion Models Authors:Huanlei Guo, Hongxin Wei, Bingyi Jing View a PDF of the paper titled Toward Early Quality Assessment of Text-to-Image Diffusion Models, by Huanlei Guo and 2 other authors View PDF HTML (experimental) Abstract:Recent text-to-image (T2I) diffusion and flow-matching models can produce highly realistic images from natural language prompts. In practical scenarios, T2I systems are often run in a ``generate--then--select'' mode: many seeds are sampled and only a few images are kept for use. However, this pipeline is highly resource-intensive since each candidate requires tens to hundreds of denoising steps, and evaluation metrics such as CLIPScore and ImageReward are post-hoc. In this work, we address this inefficiency by introducing Probe-Select, a plug-in module that enables efficient evaluation of image quality within the generation process. We observe that certain intermediate denoiser activations, even at early timesteps, encode a stable coarse structure, object layout and spatial arrangement--that strongly correlates with final image fidelity. Probe-Select exploits this property by predicting final quality scores directly from early activations, allowing unpromising seeds to be terminated early. Across diffusion and flow-matching backbones, our experiments show that early evaluation at only...