[2508.03587] Zero-Variance Gradients for Variational Autoencoders
Summary
This paper introduces a novel approach called Silent Gradients for training Variational Autoencoders (VAEs), which eliminates gradient estimation variance, enhancing convergence and performance.
Why It Matters
The research addresses a significant challenge in training VAEs by proposing a method that stabilizes the optimization process. By enabling zero-variance gradients, this work has implications for improving the efficiency and effectiveness of generative models, which are widely used in machine learning applications.
Key Takeaways
- Silent Gradients eliminate gradient estimation variance in VAEs.
- The proposed method allows for analytical computation of expected ELBO.
- Early encoder learning is guided by analytic gradients before transitioning to stochastic estimators.
- The approach consistently outperforms standard methods across multiple datasets.
- Architectural choices can significantly enhance the training stability of generative models.
Computer Science > Machine Learning arXiv:2508.03587 (cs) [Submitted on 5 Aug 2025 (v1), last revised 26 Feb 2026 (this version, v2)] Title:Zero-Variance Gradients for Variational Autoencoders Authors:Zilei Shao, Anji Liu, Guy Van den Broeck View a PDF of the paper titled Zero-Variance Gradients for Variational Autoencoders, by Zilei Shao and 2 other authors View PDF HTML (experimental) Abstract:Training deep generative models like Variational Autoencoders (VAEs) requires propagating gradients through stochastic latent variables, which introduces estimation variance that can slow convergence and degrade performance. In this paper, we explore an orthogonal direction, which we call Silent Gradients. Instead of designing improved stochastic estimators, we show that by restricting the decoder architecture in specific ways, the expected ELBO can be computed analytically. This yields gradients with zero estimation variance as we can directly compute the evidence lower-bound without resorting to Monte Carlo samples of the latent variables. We first provide a theoretical analysis in a controlled setting with a linear decoder and demonstrate improved optimization compared to standard estimators. To extend this idea to expressive nonlinear decoders, we introduce a training paradigm that uses the analytic gradient to guide early encoder learning before annealing to a standard stochastic estimator. Across multiple datasets, our approach consistently improves established baselines, inc...