[2602.14474] One Good Source is All You Need: Near-Optimal Regret for Bandits under Heterogeneous Noise
Summary
The paper presents a novel algorithm, SOAR, for the K-armed Multiarmed Bandit problem that minimizes regret under heterogeneous noise, achieving near-optimal performance.
Why It Matters
This research addresses a significant challenge in machine learning regarding the selection of data sources with varying noise levels. By introducing SOAR, the authors provide a method that not only optimizes regret but also enhances the efficiency of decision-making in uncertain environments, which is crucial for applications in AI and data science.
Key Takeaways
- SOAR algorithm effectively minimizes regret in K-armed bandit problems with heterogeneous noise.
- The approach integrates variance-concentration bounds for efficient source selection.
- SOAR achieves optimal instance-dependent regret similar to single-source MAB despite unknown variances.
- Experimental results demonstrate SOAR's superiority over traditional baselines like Uniform UCB.
- The findings have implications for real-world applications where data source variability is common.
Computer Science > Machine Learning arXiv:2602.14474 (cs) [Submitted on 16 Feb 2026] Title:One Good Source is All You Need: Near-Optimal Regret for Bandits under Heterogeneous Noise Authors:Aadirupa Saha, Amith Bhat, Haipeng Luo View a PDF of the paper titled One Good Source is All You Need: Near-Optimal Regret for Bandits under Heterogeneous Noise, by Aadirupa Saha and 2 other authors View PDF HTML (experimental) Abstract:We study $K$-armed Multiarmed Bandit (MAB) problem with $M$ heterogeneous data sources, each exhibiting unknown and distinct noise variances $\{\sigma_j^2\}_{j=1}^M$. The learner's objective is standard MAB regret minimization, with the additional complexity of adaptively selecting which data source to query from at each round. We propose Source-Optimistic Adaptive Regret minimization (SOAR), a novel algorithm that quickly prunes high-variance sources using sharp variance-concentration bounds, followed by a `balanced min-max LCB-UCB approach' that seamlessly integrates the parallel tasks of identifying the best arm and the optimal (minimum-variance) data source. Our analysis shows SOAR achieves an instance-dependent regret bound of $\tilde{O}\left({\sigma^*}^2\sum_{i=2}^K \frac{\log T}{\Delta_i} + \sqrt{K \sum_{j=1}^M \sigma_j^2}\right)$, up to preprocessing costs depending only on problem parameters, where ${\sigma^*}^2 := \min_j \sigma_j^2$ is the minimum source variance and $\Delta_i$ denotes the suboptimality gap of the $i$-th arm. This result is bot...