[2508.03587] Zero-Variance Gradients for Variational Autoencoders

[2508.03587] Zero-Variance Gradients for Variational Autoencoders

arXiv - Machine Learning 4 min read Article

Summary

This paper introduces a novel approach called Silent Gradients for training Variational Autoencoders (VAEs), which eliminates gradient estimation variance, enhancing convergence and performance.

Why It Matters

The research addresses a significant challenge in training VAEs by proposing a method that stabilizes the optimization process. By enabling zero-variance gradients, this work has implications for improving the efficiency and effectiveness of generative models, which are widely used in machine learning applications.

Key Takeaways

  • Silent Gradients eliminate gradient estimation variance in VAEs.
  • The proposed method allows for analytical computation of expected ELBO.
  • Early encoder learning is guided by analytic gradients before transitioning to stochastic estimators.
  • The approach consistently outperforms standard methods across multiple datasets.
  • Architectural choices can significantly enhance the training stability of generative models.

Computer Science > Machine Learning arXiv:2508.03587 (cs) [Submitted on 5 Aug 2025 (v1), last revised 26 Feb 2026 (this version, v2)] Title:Zero-Variance Gradients for Variational Autoencoders Authors:Zilei Shao, Anji Liu, Guy Van den Broeck View a PDF of the paper titled Zero-Variance Gradients for Variational Autoencoders, by Zilei Shao and 2 other authors View PDF HTML (experimental) Abstract:Training deep generative models like Variational Autoencoders (VAEs) requires propagating gradients through stochastic latent variables, which introduces estimation variance that can slow convergence and degrade performance. In this paper, we explore an orthogonal direction, which we call Silent Gradients. Instead of designing improved stochastic estimators, we show that by restricting the decoder architecture in specific ways, the expected ELBO can be computed analytically. This yields gradients with zero estimation variance as we can directly compute the evidence lower-bound without resorting to Monte Carlo samples of the latent variables. We first provide a theoretical analysis in a controlled setting with a linear decoder and demonstrate improved optimization compared to standard estimators. To extend this idea to expressive nonlinear decoders, we introduce a training paradigm that uses the analytic gradient to guide early encoder learning before annealing to a standard stochastic estimator. Across multiple datasets, our approach consistently improves established baselines, inc...

Related Articles

Nomadic raises $8.4 million to wrangle the data pouring off autonomous vehicles | TechCrunch
Machine Learning

Nomadic raises $8.4 million to wrangle the data pouring off autonomous vehicles | TechCrunch

The company turns footage from robots into structured, searchable datasets with a deep learning model.

TechCrunch - AI · 6 min ·
Machine Learning

[D] Applied AI/Machine learning course by Srikanth Varma

I have all 10 modules of this course, along with all the notes, assignments, and solutions. If anyone need this course DM me. submitted b...

Reddit - Machine Learning · 1 min ·
Art schools are being torn apart by AI | The Verge
Machine Learning

Art schools are being torn apart by AI | The Verge

Many students and faculty members are opposed to using the technology, but art schools are plowing ahead with teaching AI tools regardless.

The Verge - AI · 9 min ·
AI Has Flooded All the Weather Apps | WIRED
Machine Learning

AI Has Flooded All the Weather Apps | WIRED

Weather forecasting has gotten a big boost from machine learning. How that translates into what users see can vary.

Wired - AI · 8 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime