Machine Learning Generative Ai Ai Safety

[2602.17846] Two Calm Ends and the Wild Middle: A Geometric Picture of Memorization in Diffusion Models

arXiv - Machine Learning February 23, 2026 4 min read Article

Summary

This paper explores the memorization phenomena in diffusion models, introducing a geometric framework that identifies risk levels across different noise regimes, highlighting a critical danger zone for memorization.

Why It Matters

Understanding the balance between memorization and generalization in diffusion models is crucial for addressing privacy concerns in AI. This research provides insights into how data geometry affects these processes, which can inform safer AI model design.

Key Takeaways

Diffusion models can memorize training data, raising privacy issues.
Memorization risk varies significantly across different noise levels.
A danger zone for memorization exists at medium noise levels.
Small and large noise regimes resist memorization through different mechanisms.
A geometry-informed intervention can mitigate memorization risks.

Computer Science > Machine Learning arXiv:2602.17846 (cs) [Submitted on 19 Feb 2026] Title:Two Calm Ends and the Wild Middle: A Geometric Picture of Memorization in Diffusion Models Authors:Nick Dodson, Xinyu Gao, Qingsong Wang, Yusu Wang, Zhengchao Wan View a PDF of the paper titled Two Calm Ends and the Wild Middle: A Geometric Picture of Memorization in Diffusion Models, by Nick Dodson and 4 other authors View PDF HTML (experimental) Abstract:Diffusion models generate high-quality samples but can also memorize training data, raising serious privacy concerns. Understanding the mechanisms governing when memorization versus generalization occurs remains an active area of research. In particular, it is unclear where along the noise schedule memorization is induced, how data geometry influences it, and how phenomena at different noise scales interact. We introduce a geometric framework that partitions the noise schedule into three regimes based on the coverage properties of training data by Gaussian shells and the concentration behavior of the posterior, which we argue are two fundamental objects governing memorization and generalization in diffusion models. This perspective reveals that memorization risk is highly non-uniform across noise levels. We further identify a danger zone at medium noise levels where memorization is most pronounced. In contrast, both the small and large noise regimes resist memorization, but through fundamentally different mechanisms: small noise av...

Read Original Article