[2602.22610] DP-aware AdaLN-Zero: Taming Conditioning-Induced Heavy-Tailed Gradients in Differentially Private Diffusion
Summary
The paper introduces DP-aware AdaLN-Zero, a novel mechanism to mitigate heavy-tailed gradients in differentially private diffusion models, enhancing performance while maintaining privacy.
Why It Matters
This research addresses a critical challenge in machine learning regarding the balance between privacy and model performance. By improving gradient handling in differentially private settings, it opens pathways for more effective applications in sensitive data environments, particularly in time-series tasks.
Key Takeaways
- DP-aware AdaLN-Zero improves gradient management in private diffusion models.
- The mechanism reduces the impact of outlier-driven gradients on model updates.
- Empirical results show enhanced performance in interpolation and forecasting tasks.
- Sensitivity-aware conditioning can lead to better privacy-preserving training.
- The approach maintains expressiveness in standard non-private training.
Computer Science > Machine Learning arXiv:2602.22610 (cs) [Submitted on 26 Feb 2026] Title:DP-aware AdaLN-Zero: Taming Conditioning-Induced Heavy-Tailed Gradients in Differentially Private Diffusion Authors:Tao Huang, Jiayang Meng, Xu Yang, Chen Hou, Hong Chen View a PDF of the paper titled DP-aware AdaLN-Zero: Taming Conditioning-Induced Heavy-Tailed Gradients in Differentially Private Diffusion, by Tao Huang and 4 other authors View PDF HTML (experimental) Abstract:Condition injection enables diffusion models to generate context-aware outputs, which is essential for many time-series tasks. However, heterogeneous conditional contexts (e.g., observed history, missingness patterns or outlier covariates) can induce heavy-tailed per-example gradients. Under Differentially Private Stochastic Gradient Descent (DP-SGD), these rare conditioning-driven heavy-tailed gradients disproportionately trigger global clipping, resulting in outlier-dominated updates, larger clipping bias, and degraded utility under a fixed privacy budget. In this paper, we propose DP-aware AdaLN-Zero, a drop-in sensitivity-aware conditioning mechanism for conditional diffusion transformers that limits conditioning-induced gain without modifying the DP-SGD mechanism. DP-aware AdaLN-Zero jointly constrains conditioning representation magnitude and AdaLN modulation parameters via bounded re-parameterization, suppressing extreme gradient tail events before gradient clipping and noise injection. Empirically, DP-...