[2603.03469] Biased Generalization in Diffusion Models
About this article
Abstract page for arXiv paper 2603.03469: Biased Generalization in Diffusion Models
Computer Science > Machine Learning arXiv:2603.03469 (cs) [Submitted on 3 Mar 2026] Title:Biased Generalization in Diffusion Models Authors:Jerome Garnier-Brun, Luca Biggio, Davide Beltrame, Marc Mézard, Luca Saglietti View a PDF of the paper titled Biased Generalization in Diffusion Models, by Jerome Garnier-Brun and 4 other authors View PDF HTML (experimental) Abstract:Generalization in generative modeling is defined as the ability to learn an underlying distribution from a finite dataset and produce novel samples, with evaluation largely driven by held-out performance and perceived sample quality. In practice, training is often stopped at the minimum of the test loss, taken as an operational indicator of generalization. We challenge this viewpoint by identifying a phase of biased generalization during training, in which the model continues to decrease the test loss while favoring samples with anomalously high proximity to training data. By training the same network on two disjoint datasets and comparing the mutual distances of generated samples and their similarity to training data, we introduce a quantitative measure of bias and demonstrate its presence on real images. We then study the mechanism of bias, using a controlled hierarchical data model where access to exact scores and ground-truth statistics allows us to precisely characterize its onset. We attribute this phenomenon to the sequential nature of feature learning in deep networks, where coarse structure is lea...