[2408.01839] Optimal Local Convergence Rates of Stochastic First-Order Methods under Local $α$-PL
Summary
This paper explores the local convergence rates of stochastic first-order methods under the local α-Polyak-Lojasiewicz condition, establishing both lower and upper bounds for oracle queries needed to achieve desired accuracy.
Why It Matters
Understanding the convergence rates of stochastic optimization methods is crucial for improving algorithm efficiency in machine learning and optimization tasks. This research provides insights that can enhance the performance of algorithms in practical applications, particularly in convex settings.
Key Takeaways
- Establishes lower and upper bounds for oracle queries in stochastic first-order methods.
- Introduces the local α-PL condition, bridging classical and Holder-type error bounds.
- Demonstrates a SARAH-type variance-reduced method achieving optimal convergence rates.
- Highlights implications for reaching global optima in convex settings.
- Contributes to theoretical understanding of stochastic optimization complexities.
Mathematics > Optimization and Control arXiv:2408.01839 (math) [Submitted on 3 Aug 2024 (v1), last revised 23 Feb 2026 (this version, v2)] Title:Optimal Local Convergence Rates of Stochastic First-Order Methods under Local $α$-PL Authors:Saeed Masiha, Saber Salehkaleybar, Niao He, Negar Kiyavash, Patrick Thiran View a PDF of the paper titled Optimal Local Convergence Rates of Stochastic First-Order Methods under Local $\alpha$-PL, by Saeed Masiha and 4 other authors View PDF HTML (experimental) Abstract:We study the local convergence rate of stochastic first-order methods under a local $\alpha$-Polyak-Lojasiewicz ($\alpha$-PL) condition in a neighborhood of a target connected component $\mathcal{M}$ of the local minimizer set. The parameter $\alpha \in [1,2]$ is the exponent of the gradient norm in the $\alpha$-PL inequality: $\alpha=2$ recovers the classical PL case, $\alpha=1$ corresponds to Holder-type error bounds, and intermediate values interpolate between these regimes. Our performance criterion is the number of oracle queries required to output $\hat{x}$ with $F(\hat{x})-l \le \varepsilon$, where $l := F(y)$ for any $y \in \mathcal{M}$. We work in a local regime where the algorithm is initialized near $\mathcal{M}$ and, with high probability, its iterates remain in that neighborhood. We establish a lower bound $\Omega(\varepsilon^{-2/\alpha})$ for all stochastic first-order methods in this regime, and we obtain a matching upper bound $\mathcal{O}(\varepsilon^{-2/\a...