[2509.22007] Stage-wise Dynamics of Classifier-Free Guidance in Diffusion Models
Summary
This paper explores the dynamics of Classifier-Free Guidance (CFG) in diffusion models, revealing its effects on sampling processes and diversity across three stages: Direction Shift, Mode Separation, and Concentration.
Why It Matters
Understanding CFG's impact on sampling dynamics is crucial for improving the performance of diffusion models in machine learning. This research provides insights into balancing semantic alignment and diversity, which is essential for developing more effective generative models.
Key Takeaways
- CFG enhances conditional fidelity but can reduce diversity in outputs.
- The sampling process unfolds in three stages, each affecting model behavior differently.
- A time-varying guidance schedule can optimize both quality and diversity in generated outputs.
Computer Science > Machine Learning arXiv:2509.22007 (cs) [Submitted on 26 Sep 2025 (v1), last revised 18 Feb 2026 (this version, v2)] Title:Stage-wise Dynamics of Classifier-Free Guidance in Diffusion Models Authors:Cheng Jin, Qitan Shi, Yuantao Gu View a PDF of the paper titled Stage-wise Dynamics of Classifier-Free Guidance in Diffusion Models, by Cheng Jin and 2 other authors View PDF HTML (experimental) Abstract:Classifier-Free Guidance (CFG) is widely used to improve conditional fidelity in diffusion models, but its impact on sampling dynamics remains poorly understood. Prior studies, often restricted to unimodal conditional distributions or simplified cases, provide only a partial picture. We analyze CFG under multimodal conditionals and show that the sampling process unfolds in three successive stages. In the Direction Shift stage, guidance accelerates movement toward the weighted mean, introducing initialization bias and norm growth. In the Mode Separation stage, local dynamics remain largely neutral, but the inherited bias suppresses weaker modes, reducing global diversity. In the Concentration stage, guidance amplifies within-mode contraction, diminishing fine-grained variability. This unified view explains a widely observed phenomenon: stronger guidance improves semantic alignment but inevitably reduces diversity. Experiments support these predictions, showing that early strong guidance erodes global diversity, while late strong guidance suppresses fine-grained...