[2603.25029] Optimal High-Probability Regret for Online Convex Optimization with Two-Point Bandit Feedback
About this article
Abstract page for arXiv paper 2603.25029: Optimal High-Probability Regret for Online Convex Optimization with Two-Point Bandit Feedback
Computer Science > Machine Learning arXiv:2603.25029 (cs) [Submitted on 26 Mar 2026] Title:Optimal High-Probability Regret for Online Convex Optimization with Two-Point Bandit Feedback Authors:Haishan Ye View a PDF of the paper titled Optimal High-Probability Regret for Online Convex Optimization with Two-Point Bandit Feedback, by Haishan Ye View PDF HTML (experimental) Abstract:We consider the problem of Online Convex Optimization (OCO) with two-point bandit feedback in an adversarial environment. In this setting, a player attempts to minimize a sequence of adversarially generated convex loss functions, while only observing the value of each function at two points. While it is well-known that two-point feedback allows for gradient estimation, achieving tight high-probability regret bounds for strongly convex functions still remained open as highlighted by \citet{agarwal2010optimal}. The primary challenge lies in the heavy-tailed nature of bandit gradient estimators, which makes standard concentration analysis difficult. In this paper, we resolve this open challenge by providing the first high-probability regret bound of $O(d(\log T + \log(1/\delta))/\mu)$ for $\mu$-strongly convex losses. Our result is minimax optimal with respect to both the time horizon $T$ and the dimension $d$. Subjects: Machine Learning (cs.LG) Cite as: arXiv:2603.25029 [cs.LG] (or arXiv:2603.25029v1 [cs.LG] for this version) https://doi.org/10.48550/arXiv.2603.25029 Focus to learn more arXiv-iss...