[2505.11985] Variance-Optimal Arm Selection: Misallocation Minimization and Best Arm Identification
Summary
This paper presents novel algorithms for selecting the arm with the highest variance among independent arms, focusing on misallocation minimization and best arm identification.
Why It Matters
The study addresses critical challenges in multi-armed bandit problems, particularly in optimizing decision-making processes in uncertain environments. By introducing new algorithms, it enhances the efficiency of identifying optimal choices, which is vital in various applications like finance and machine learning.
Key Takeaways
- Introduces UCB-VV and SHVV algorithms for variance-optimal arm selection.
- Demonstrates that UCB-VV achieves order optimality in misallocation minimization.
- SHVV shows superior performance in fixed-budget best arm identification.
- Extends theoretical frameworks to sub-Gaussian distributions with new concentration inequalities.
- Empirical results indicate UCB-VV outperforms traditional methods like epsilon-greedy.
Computer Science > Machine Learning arXiv:2505.11985 (cs) [Submitted on 17 May 2025 (v1), last revised 17 Feb 2026 (this version, v3)] Title:Variance-Optimal Arm Selection: Misallocation Minimization and Best Arm Identification Authors:Sabrina Khurshid, Gourab Ghatak, Mohammad Shahid Abdulla View a PDF of the paper titled Variance-Optimal Arm Selection: Misallocation Minimization and Best Arm Identification, by Sabrina Khurshid and 2 other authors View PDF HTML (experimental) Abstract:This paper focuses on selecting the arm with the highest variance from a set of $K$ independent arms. Specifically, we focus on two settings: (i) misallocation minimization setting, that penalizes the number of pulls of suboptimal arms in terms of variance, and (ii) fixed-budget best arm identification setting, that evaluates the ability of an algorithm to determine the arm with the highest variance after a fixed number of pulls. We develop a novel online algorithm called UCB-VV for the misallocation minimization (MM) and show that its upper bound on misallocation for bounded rewards evolves as $\mathcal{O}\left(\log{n}\right)$ where $n$ is the horizon. By deriving the lower bound on the misallocation, we show that UCB-VV is order optimal. For the fixed budget best arm identification (BAI) setting we propose the SHVV algorithm. We show that the upper bound of the error probability of SHVV evolves as $\exp\left(-\frac{n}{\log(K) H}\right)$, where $H$ represents the complexity of the problem, a...