[2510.15483] Fast Best-in-Class Regret for Contextual Bandits
About this article
Abstract page for arXiv paper 2510.15483: Fast Best-in-Class Regret for Contextual Bandits
Statistics > Machine Learning arXiv:2510.15483 (stat) [Submitted on 17 Oct 2025 (v1), last revised 3 Apr 2026 (this version, v2)] Title:Fast Best-in-Class Regret for Contextual Bandits Authors:Samuel Girard, Aurelien Bibaut, Arthur Gretton, Nathan Kallus, Houssam Zenati View a PDF of the paper titled Fast Best-in-Class Regret for Contextual Bandits, by Samuel Girard and 4 other authors View PDF HTML (experimental) Abstract:We study the problem of stochastic contextual bandits in the agnostic setting, where the goal is to compete with the best policy in a given class without assuming realizability or imposing model restrictions on losses or rewards. In this work, we establish the first fast rate for regret relative to the best-in-class policy. Our proposed algorithm updates the policy at every round by minimizing a pessimistic objective, defined as a clipped inverse-propensity estimate of the policy value plus a variance penalty. By leveraging entropy assumptions on the policy class and a Hölderian error-bound condition (a generalization of the margin condition), we achieve fast best-in-class regret rates, including polylogarithmic rates in the parametric case. The analysis is driven by a sequential self-normalized maximal inequality for bounded martingale empirical processes, which yields uniform variance-adaptive confidence bounds and guarantees pessimism under adaptive data collection. Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG) Cite as: arXiv:2510.154...