[2603.23792] Manifold Generalization Provably Proceeds Memorization in Diffusion Models
About this article
Abstract page for arXiv paper 2603.23792: Manifold Generalization Provably Proceeds Memorization in Diffusion Models
Computer Science > Machine Learning arXiv:2603.23792 (cs) [Submitted on 24 Mar 2026] Title:Manifold Generalization Provably Proceeds Memorization in Diffusion Models Authors:Zebang Shen, Ya-Ping Hsieh, Niao He View a PDF of the paper titled Manifold Generalization Provably Proceeds Memorization in Diffusion Models, by Zebang Shen and 2 other authors View PDF HTML (experimental) Abstract:Diffusion models often generate novel samples even when the learned score is only \emph{coarse} -- a phenomenon not accounted for by the standard view of diffusion training as density estimation. In this paper, we show that, under the \emph{manifold hypothesis}, this behavior can instead be explained by coarse scores capturing the \emph{geometry} of the data while discarding the fine-scale distributional structure of the population measure~$\mu_{\scriptscriptstyle\mathrm{data}}$. Concretely, whereas estimating the full data distribution $\mu_{\scriptscriptstyle\mathrm{data}}$ supported on a $k$-dimensional manifold is known to require the classical minimax rate $\tilde{\mathcal{O}}(N^{-1/k})$, we prove that diffusion models trained with coarse scores can exploit the \emph{regularity of the manifold support} and attain a near-parametric rate toward a \emph{different} target distribution. This target distribution has density uniformly comparable to that of~$\mu_{\scriptscriptstyle\mathrm{data}}$ throughout any $\tilde{\mathcal{O}}\bigl(N^{-\beta/(4k)}\bigr)$-neighborhood of the manifold, wher...