[2602.17940] Tighter Regret Lower Bound for Gaussian Process Bandits with Squared Exponential Kernel in Hypersphere
Summary
This paper presents a tighter lower bound on cumulative regret for Gaussian process bandits using a squared exponential kernel in a hyperspherical domain, addressing an open question in the field.
Why It Matters
Understanding the regret bounds in Gaussian process bandits is crucial for developing efficient algorithms in machine learning. This research contributes to the theoretical foundation of bandit problems, which are widely applicable in optimization, reinforcement learning, and decision-making scenarios.
Key Takeaways
- Establishes a worst-case lower bound for Gaussian process bandits.
- Focuses on the squared exponential kernel in a hyperspherical domain.
- Addresses the gap in dimension-dependent logarithmic factors between upper and lower bounds.
- Provides improved upper bounds on maximum information gain for the SE kernel.
- Confirms the optimality of existing algorithms up to dimension-independent factors.
Computer Science > Machine Learning arXiv:2602.17940 (cs) [Submitted on 20 Feb 2026] Title:Tighter Regret Lower Bound for Gaussian Process Bandits with Squared Exponential Kernel in Hypersphere Authors:Shogo Iwazaki View a PDF of the paper titled Tighter Regret Lower Bound for Gaussian Process Bandits with Squared Exponential Kernel in Hypersphere, by Shogo Iwazaki View PDF HTML (experimental) Abstract:We study an algorithm-independent, worst-case lower bound for the Gaussian process (GP) bandit problem in the frequentist setting, where the reward function is fixed and has a bounded norm in the known reproducing kernel Hilbert space (RKHS). Specifically, we focus on the squared exponential (SE) kernel, one of the most widely used kernel functions in GP bandits. One of the remaining open questions for this problem is the gap in the \emph{dimension-dependent} logarithmic factors between upper and lower bounds. This paper partially resolves this open question under a hyperspherical input domain. We show that any algorithm suffers $\Omega(\sqrt{T (\ln T)^{d} (\ln \ln T)^{-d}})$ cumulative regret, where $T$ and $d$ represent the total number of steps and the dimension of the hyperspherical domain, respectively. Regarding the simple regret, we show that any algorithm requires $\Omega(\epsilon^{-2}(\ln \frac{1}{\epsilon})^d (\ln \ln \frac{1}{\epsilon})^{-d})$ time steps to find an $\epsilon$-optimal point. We also provide the improved $O((\ln T)^{d+1}(\ln \ln T)^{-d})$ upper boun...