[2306.14853] Near-Optimal Nonconvex-Strongly-Convex Bilevel Optimization with Fully First-Order Oracles

[2306.14853] Near-Optimal Nonconvex-Strongly-Convex Bilevel Optimization with Fully First-Order Oracles

arXiv - Machine Learning 4 min read

About this article

Abstract page for arXiv paper 2306.14853: Near-Optimal Nonconvex-Strongly-Convex Bilevel Optimization with Fully First-Order Oracles

Mathematics > Optimization and Control arXiv:2306.14853 (math) [Submitted on 26 Jun 2023 (v1), last revised 24 Mar 2026 (this version, v4)] Title:Near-Optimal Nonconvex-Strongly-Convex Bilevel Optimization with Fully First-Order Oracles Authors:Lesi Chen, Yaohua Ma, Jingzhao Zhang View a PDF of the paper titled Near-Optimal Nonconvex-Strongly-Convex Bilevel Optimization with Fully First-Order Oracles, by Lesi Chen and 2 other authors View PDF Abstract:In this work, we consider bilevel optimization when the lower-level problem is strongly convex. Recent works show that with a Hessian-vector product (HVP) oracle, one can provably find an $\epsilon$-stationary point within ${\mathcal{O}}(\epsilon^{-2})$ oracle calls. However, the HVP oracle may be inaccessible or expensive in practice. Kwon et al. (ICML 2023) addressed this issue by proposing a first-order method that can achieve the same goal at a slower rate of $\tilde{\mathcal{O}}(\epsilon^{-3})$. In this paper, we incorporate a two-time-scale update to improve their method to achieve the near-optimal $\tilde {\mathcal{O}}(\epsilon^{-2})$ first-order oracle complexity. Our analysis is highly extensible. In the stochastic setting, our algorithm can achieve the stochastic first-order oracle complexity of $\tilde {\mathcal{O}}(\epsilon^{-4})$ and $\tilde {\mathcal{O}}(\epsilon^{-6})$ when the stochastic noises are only in the upper-level objective and in both level objectives, respectively. When the objectives have higher-ord...

Originally published on March 25, 2026. Curated by AI News.

Related Articles

Machine Learning

[P] Fused MoE Dispatch in Pure Triton: Beating CUDA-Optimized Megablocks at Inference Batch Sizes

I built a fused MoE dispatch kernel in pure Triton that handles the full forward pass for Mixture-of-Experts models. No CUDA, no vendor-s...

Reddit - Machine Learning · 1 min ·
Machine Learning

[D] ICML Rebuttal Question

I am currently working on my response on the rebuttal acknowledgments for ICML and I doubting how to handle the strawman argument of that...

Reddit - Machine Learning · 1 min ·
Machine Learning

[D] ML researcher looking to switch to a product company.

Hey, I am an AI researcher currently working in a deep tech company as a data scientist. Prior to this, I was doing my PhD. My current ro...

Reddit - Machine Learning · 1 min ·
Machine Learning

Building behavioural response models of public figures using Brain scan data (Predict their next move using psychological modelling) [P]

Hey guys, I’m the same creator of Netryx V2, the geolocation tool. I’ve been working on something new called COGNEX. It learns how a pers...

Reddit - Machine Learning · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime