[2402.15127] Asymptotically and Minimax Optimal Regret Bounds for Multi-Armed Bandits with Abstention
About this article
Abstract page for arXiv paper 2402.15127: Asymptotically and Minimax Optimal Regret Bounds for Multi-Armed Bandits with Abstention
Computer Science > Machine Learning arXiv:2402.15127 (cs) [Submitted on 23 Feb 2024 (v1), last revised 22 Mar 2026 (this version, v2)] Title:Asymptotically and Minimax Optimal Regret Bounds for Multi-Armed Bandits with Abstention Authors:Junwen Yang, Tianyuan Jin, Vincent Y. F. Tan View a PDF of the paper titled Asymptotically and Minimax Optimal Regret Bounds for Multi-Armed Bandits with Abstention, by Junwen Yang and 2 other authors View PDF HTML (experimental) Abstract:We introduce a novel extension of the canonical multi-armed bandit problem that incorporates an additional strategic innovation: abstention. In this enhanced framework, the agent is not only tasked with selecting an arm at each time step, but also has the option to abstain from accepting the stochastic instantaneous reward before observing it. When opting for abstention, the agent either suffers a fixed regret or gains a guaranteed reward. This added layer of complexity naturally prompts the key question: can we develop algorithms that are both computationally efficient and asymptotically and minimax optimal in this setting? We answer this question in the affirmative by designing and analyzing algorithms whose regrets meet their corresponding information-theoretic lower bounds. Our results offer valuable quantitative insights into the benefits of the abstention option, laying the groundwork for further exploration in other online decision-making problems with such an option. Extensive numerical experiment...