[2511.07270] High-Dimensional Asymptotics of Differentially Private PCA
Summary
This paper investigates the high-dimensional asymptotics of differentially private PCA, focusing on optimal noise levels for privacy guarantees while preserving data utility.
Why It Matters
Understanding the balance between privacy and data utility is crucial in machine learning, especially in high-dimensional settings. This research provides sharper privacy characterizations that could lead to more efficient algorithms in differentially private data analysis, impacting fields that rely on sensitive data.
Key Takeaways
- The paper explores optimal noise levels in differentially private PCA.
- It challenges existing pessimistic privacy bounds that may lead to excessive noise.
- Sharp privacy characterizations are provided for high-dimensional datasets.
- The analysis combines hypothesis-testing and classical contiguity arguments.
- Findings could enhance the utility of privatized data in machine learning applications.
Mathematics > Statistics Theory arXiv:2511.07270 (math) [Submitted on 10 Nov 2025 (v1), last revised 15 Feb 2026 (this version, v2)] Title:High-Dimensional Asymptotics of Differentially Private PCA Authors:Youngjoo Yun, Rishabh Dudeja View a PDF of the paper titled High-Dimensional Asymptotics of Differentially Private PCA, by Youngjoo Yun and Rishabh Dudeja View PDF Abstract:In differential privacy, statistics of a sensitive dataset are privatized by introducing random noise. Most privacy analyses provide privacy bounds specifying a noise level sufficient to achieve a target privacy guarantee. Sometimes, these bounds are pessimistic and suggest adding excessive noise, which overwhelms the meaningful signal. It remains unclear if such high noise levels are truly necessary or a limitation of the proof techniques. This paper explores whether we can obtain sharp privacy characterizations that identify the smallest noise level required to achieve a target privacy level for a given mechanism. We study this problem in the context of differentially private principal component analysis, where the goal is to privatize the leading principal components (PCs) of a dataset with n samples and p features. We analyze the exponential mechanism for this problem in a model-free setting and provide sharp utility and privacy characterizations in the high-dimensional limit ($p\rightarrow\infty$). Our privacy result shows that, in high dimensions, detecting the presence of a target individual in...