[2602.18718] Stochastic Gradient Variational Inference with Price's Gradient Estimator from Bures-Wasserstein to Parameter Space
Summary
This paper presents advancements in Stochastic Gradient Variational Inference (SGVI) using Price's Gradient Estimator, demonstrating comparable performance to existing methods while leveraging second-order information for improved efficiency.
Why It Matters
The study addresses the convergence guarantees of Stochastic Gradient Variational Inference methods, highlighting the significance of Price's gradient in enhancing performance. This has implications for machine learning practitioners seeking efficient inference techniques, particularly in complex models.
Key Takeaways
- Price's gradient estimator offers significant performance improvements in SGVI.
- Wasserstein Variational Inference (WVI) and black-box VI (BBVI) achieve similar iteration complexities.
- The study empirically validates the advantages of using second-order information in variational inference.
- WVI can be adapted to utilize the reparametrization gradient for broader applicability.
- Understanding these methods can enhance model training efficiency in machine learning applications.
Statistics > Machine Learning arXiv:2602.18718 (stat) [Submitted on 21 Feb 2026] Title:Stochastic Gradient Variational Inference with Price's Gradient Estimator from Bures-Wasserstein to Parameter Space Authors:Kyurae Kim, Qiang Fu, Yi-An Ma, Jacob R. Gardner, Trevor Campbell View a PDF of the paper titled Stochastic Gradient Variational Inference with Price's Gradient Estimator from Bures-Wasserstein to Parameter Space, by Kyurae Kim and 4 other authors View PDF Abstract:For approximating a target distribution given only its unnormalized log-density, stochastic gradient-based variational inference (VI) algorithms are a popular approach. For example, Wasserstein VI (WVI) and black-box VI (BBVI) perform gradient descent in measure space (Bures-Wasserstein space) and parameter space, respectively. Previously, for the Gaussian variational family, convergence guarantees for WVI have shown superiority over existing results for black-box VI with the reparametrization gradient, suggesting the measure space approach might provide some unique benefits. In this work, however, we close this gap by obtaining identical state-of-the-art iteration complexity guarantees for both. In particular, we identify that WVI's superiority stems from the specific gradient estimator it uses, which BBVI can also leverage with minor modifications. The estimator in question is usually associated with Price's theorem and utilizes second-order information (Hessians) of the target log-density. We will refe...