[2306.14853] Near-Optimal Nonconvex-Strongly-Convex Bilevel

[2306.14853] Near-Optimal Nonconvex-Strongly-Convex Bilevel Optimization with Fully First-Order Oracles

arXiv - Machine Learning March 25, 2026 4 min read

About this article

Abstract page for arXiv paper 2306.14853: Near-Optimal Nonconvex-Strongly-Convex Bilevel Optimization with Fully First-Order Oracles

Mathematics > Optimization and Control arXiv:2306.14853 (math) [Submitted on 26 Jun 2023 (v1), last revised 24 Mar 2026 (this version, v4)] Title:Near-Optimal Nonconvex-Strongly-Convex Bilevel Optimization with Fully First-Order Oracles Authors:Lesi Chen, Yaohua Ma, Jingzhao Zhang View a PDF of the paper titled Near-Optimal Nonconvex-Strongly-Convex Bilevel Optimization with Fully First-Order Oracles, by Lesi Chen and 2 other authors View PDF Abstract:In this work, we consider bilevel optimization when the lower-level problem is strongly convex. Recent works show that with a Hessian-vector product (HVP) oracle, one can provably find an $\epsilon$-stationary point within ${\mathcal{O}}(\epsilon^{-2})$ oracle calls. However, the HVP oracle may be inaccessible or expensive in practice. Kwon et al. (ICML 2023) addressed this issue by proposing a first-order method that can achieve the same goal at a slower rate of $\tilde{\mathcal{O}}(\epsilon^{-3})$. In this paper, we incorporate a two-time-scale update to improve their method to achieve the near-optimal $\tilde {\mathcal{O}}(\epsilon^{-2})$ first-order oracle complexity. Our analysis is highly extensible. In the stochastic setting, our algorithm can achieve the stochastic first-order oracle complexity of $\tilde {\mathcal{O}}(\epsilon^{-4})$ and $\tilde {\mathcal{O}}(\epsilon^{-6})$ when the stochastic noises are only in the upper-level objective and in both level objectives, respectively. When the objectives have higher-ord...

Originally published on March 25, 2026. Curated by AI News.

Machine Learning

[P] Fused MoE Dispatch in Pure Triton: Beating CUDA-Optimized Megablocks at Inference Batch Sizes

I built a fused MoE dispatch kernel in pure Triton that handles the full forward pass for Mixture-of-Experts models. No CUDA, no vendor-s...

Reddit - Machine Learning · 1 min · about 1 hour ago

Machine Learning

[D] ICML Rebuttal Question

I am currently working on my response on the rebuttal acknowledgments for ICML and I doubting how to handle the strawman argument of that...

Reddit - Machine Learning · 1 min · about 3 hours ago

Machine Learning

[D] ML researcher looking to switch to a product company.

Hey, I am an AI researcher currently working in a deep tech company as a data scientist. Prior to this, I was doing my PhD. My current ro...

Reddit - Machine Learning · 1 min · about 5 hours ago

Machine Learning

Building behavioural response models of public figures using Brain scan data (Predict their next move using psychological modelling) [P]

Hey guys, I’m the same creator of Netryx V2, the geolocation tool. I’ve been working on something new called COGNEX. It learns how a pers...

Reddit - Machine Learning · 1 min · about 6 hours ago

[2306.14853] Near-Optimal Nonconvex-Strongly-Convex Bilevel Optimization with Fully First-Order Oracles

About this article

Related Articles

[P] Fused MoE Dispatch in Pure Triton: Beating CUDA-Optimized Megablocks at Inference Batch Sizes

[D] ICML Rebuttal Question

[D] ML researcher looking to switch to a product company.

Building behavioural response models of public figures using Brain scan data (Predict their next move using psychological modelling) [P]

No comments

Stay updated with AI News