[2602.22936] Generalization Bounds of Stochastic Gradient Descent in Homogeneous Neural Networks

[2602.22936] Generalization Bounds of Stochastic Gradient Descent in Homogeneous Neural Networks

arXiv - Machine Learning 3 min read Article

Summary

This paper explores generalization bounds for Stochastic Gradient Descent (SGD) in homogeneous neural networks, revealing that slower stepsize decay can enhance optimization under certain conditions.

Why It Matters

Understanding generalization bounds is crucial for improving the performance of machine learning models. This research provides insights into optimizing SGD, which is widely used in training neural networks, thereby potentially enhancing model accuracy and efficiency.

Key Takeaways

  • Algorithmic stability is essential for generalization analysis in machine learning.
  • The study shows that slower stepsize decay can be effective in non-convex training scenarios.
  • Findings are applicable to various neural network architectures, including fully-connected and convolutional networks.

Computer Science > Machine Learning arXiv:2602.22936 (cs) [Submitted on 26 Feb 2026] Title:Generalization Bounds of Stochastic Gradient Descent in Homogeneous Neural Networks Authors:Wenquan Ma, Yang Sui, Jiaye Teng, Bohan Wang, Jing Xu, Jingqin Yang View a PDF of the paper titled Generalization Bounds of Stochastic Gradient Descent in Homogeneous Neural Networks, by Wenquan Ma and 5 other authors View PDF HTML (experimental) Abstract:Algorithmic stability is among the most potent techniques in generalization analysis. However, its derivation usually requires a stepsize $\eta_t = \mathcal{O}(1/t)$ under non-convex training regimes, where $t$ denotes iterations. This rigid decay of the stepsize potentially impedes optimization and may not align with practical scenarios. In this paper, we derive the generalization bounds under the homogeneous neural network regimes, proving that this regime enables slower stepsize decay of order $\Omega(1/\sqrt{t})$ under mild assumptions. We further extend the theoretical results from several aspects, e.g., non-Lipschitz regimes. This finding is broadly applicable, as homogeneous neural networks encompass fully-connected and convolutional neural networks with ReLU and LeakyReLU activations. Subjects: Machine Learning (cs.LG) Cite as: arXiv:2602.22936 [cs.LG]   (or arXiv:2602.22936v1 [cs.LG] for this version)   https://doi.org/10.48550/arXiv.2602.22936 Focus to learn more arXiv-issued DOI via DataCite (pending registration) Submission histor...

Related Articles

UMKC Announces New Master of Science in Artificial Intelligence
Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min ·
Machine Learning

[D] Looking for definition of open-world ish learning problem

Hello! Recently I did a project where I initially had around 30 target classes. But at inference, the model had to be able to handle a lo...

Reddit - Machine Learning · 1 min ·
Mystery Shopping Meets Machine Learning: Can Algorithms Become the Ultimate Customer Experience Auditor?
Machine Learning

Mystery Shopping Meets Machine Learning: Can Algorithms Become the Ultimate Customer Experience Auditor?

Customer expectations across Africa are shifting faster than most organisations can track. A single inconsistent interaction can ignite a...

AI News - General · 8 min ·
Machine Learning

GitHub to Use User Data for AI Training by Default

submitted by /u/i-drake [link] [comments]

Reddit - Artificial Intelligence · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime