[2505.19046] When Models Don't Collapse: On the Consistency of Iterative MLE
About this article
Abstract page for arXiv paper 2505.19046: When Models Don't Collapse: On the Consistency of Iterative MLE
Statistics > Machine Learning arXiv:2505.19046 (stat) [Submitted on 25 May 2025 (v1), last revised 26 Mar 2026 (this version, v3)] Title:When Models Don't Collapse: On the Consistency of Iterative MLE Authors:Daniel Barzilai, Ohad Shamir View a PDF of the paper titled When Models Don't Collapse: On the Consistency of Iterative MLE, by Daniel Barzilai and 1 other authors View PDF HTML (experimental) Abstract:The widespread use of generative models has created a feedback loop, in which each generation of models is trained on data partially produced by its predecessors. This process has raised concerns about model collapse: A critical degradation in performance caused by repeated training on synthetic data. However, different analyses in the literature have reached different conclusions as to the severity of model collapse. As such, it remains unclear how concerning this phenomenon is, and under which assumptions it can be avoided. To address this, we theoretically study model collapse for maximum likelihood estimation (MLE), in a natural setting where synthetic data is gradually added to the original data set. Under standard assumptions (similar to those long used for proving asymptotic consistency and normality of MLE), we establish non-asymptotic bounds showing that collapse can be avoided even as the fraction of real data vanishes. On the other hand, we prove that some assumptions (beyond MLE consistency) are indeed necessary: Without them, model collapse can occur arbitr...