[2602.22936] Generalization Bounds of Stochastic Gradient Descent in Homogeneous Neural Networks
Summary
This paper explores generalization bounds for Stochastic Gradient Descent (SGD) in homogeneous neural networks, revealing that slower stepsize decay can enhance optimization under certain conditions.
Why It Matters
Understanding generalization bounds is crucial for improving the performance of machine learning models. This research provides insights into optimizing SGD, which is widely used in training neural networks, thereby potentially enhancing model accuracy and efficiency.
Key Takeaways
- Algorithmic stability is essential for generalization analysis in machine learning.
- The study shows that slower stepsize decay can be effective in non-convex training scenarios.
- Findings are applicable to various neural network architectures, including fully-connected and convolutional networks.
Computer Science > Machine Learning arXiv:2602.22936 (cs) [Submitted on 26 Feb 2026] Title:Generalization Bounds of Stochastic Gradient Descent in Homogeneous Neural Networks Authors:Wenquan Ma, Yang Sui, Jiaye Teng, Bohan Wang, Jing Xu, Jingqin Yang View a PDF of the paper titled Generalization Bounds of Stochastic Gradient Descent in Homogeneous Neural Networks, by Wenquan Ma and 5 other authors View PDF HTML (experimental) Abstract:Algorithmic stability is among the most potent techniques in generalization analysis. However, its derivation usually requires a stepsize $\eta_t = \mathcal{O}(1/t)$ under non-convex training regimes, where $t$ denotes iterations. This rigid decay of the stepsize potentially impedes optimization and may not align with practical scenarios. In this paper, we derive the generalization bounds under the homogeneous neural network regimes, proving that this regime enables slower stepsize decay of order $\Omega(1/\sqrt{t})$ under mild assumptions. We further extend the theoretical results from several aspects, e.g., non-Lipschitz regimes. This finding is broadly applicable, as homogeneous neural networks encompass fully-connected and convolutional neural networks with ReLU and LeakyReLU activations. Subjects: Machine Learning (cs.LG) Cite as: arXiv:2602.22936 [cs.LG] (or arXiv:2602.22936v1 [cs.LG] for this version) https://doi.org/10.48550/arXiv.2602.22936 Focus to learn more arXiv-issued DOI via DataCite (pending registration) Submission histor...