[2602.20646] On the Convergence of Stochastic Gradient Descent with Perturbed Forward-Backward Passes

[2602.20646] On the Convergence of Stochastic Gradient Descent with Perturbed Forward-Backward Passes

arXiv - Machine Learning 3 min read Article

Summary

This paper analyzes the convergence of Stochastic Gradient Descent (SGD) under perturbations in both forward and backward passes, providing theoretical insights and experimental validations.

Why It Matters

Understanding how perturbations affect SGD is crucial for improving optimization techniques in machine learning, particularly in deep learning where gradient noise can lead to instability. This research offers a theoretical framework that can help practitioners better manage training dynamics.

Key Takeaways

  • Perturbations in SGD can propagate and amplify through computational graphs, affecting convergence.
  • The paper provides convergence guarantees for non-convex objectives and conditions for stability.
  • Experimental results validate the theoretical findings, illustrating the behavior of gradient spikes.

Mathematics > Optimization and Control arXiv:2602.20646 (math) [Submitted on 24 Feb 2026] Title:On the Convergence of Stochastic Gradient Descent with Perturbed Forward-Backward Passes Authors:Boao Kong, Hengrui Zhang, Kun Yuan View a PDF of the paper titled On the Convergence of Stochastic Gradient Descent with Perturbed Forward-Backward Passes, by Boao Kong and 2 other authors View PDF HTML (experimental) Abstract:We study stochastic gradient descent (SGD) for composite optimization problems with $N$ sequential operators subject to perturbations in both the forward and backward passes. Unlike classical analyses that treat gradient noise as additive and localized, perturbations to intermediate outputs and gradients cascade through the computational graph, compounding geometrically with the number of operators. We present the first comprehensive theoretical analysis of this setting. Specifically, we characterize how forward and backward perturbations propagate and amplify within a single gradient step, derive convergence guarantees for both general non-convex objectives and functions satisfying the Polyak--Łojasiewicz condition, and identify conditions under which perturbations do not deteriorate the asymptotic convergence order. As a byproduct, our analysis furnishes a theoretical explanation for the gradient spiking phenomenon widely observed in deep learning, precisely characterizing the conditions under which training recovers from spikes or diverges. Experiments on lo...

Related Articles

UMKC Announces New Master of Science in Artificial Intelligence
Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min ·
OpenAI’s Fidji Simo Is Taking Medical Leave Amid an Executive Shake-Up | WIRED
Ai Infrastructure

OpenAI’s Fidji Simo Is Taking Medical Leave Amid an Executive Shake-Up | WIRED

The company is undergoing major leadership restructuring as its CEO of AGI deployment goes on leave for “several weeks.”

Wired - AI · 5 min ·
Machine Learning

[D] Best websites for pytorch/numpy interviews

Hello, I’m at the last year of my PHD and I’m starting to prepare interviews. I’m mainly aiming at applied scientist/research engineer or...

Reddit - Machine Learning · 1 min ·
OpenAI’s AGI boss is taking a leave of absence | The Verge
Ai Infrastructure

OpenAI’s AGI boss is taking a leave of absence | The Verge

OpenAI is undergoing another round of C-suite changes, according a memo, including that AGI boss Fidji Simo will be going on a medical le...

The Verge - AI · 7 min ·
More in Ai Infrastructure: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime