[2602.17958] Bayesian Online Model Selection
Summary
This article presents a new Bayesian algorithm for online model selection in stochastic bandits, addressing exploration challenges and providing empirical validation of its effectiveness.
Why It Matters
The study advances the field of machine learning by offering a robust solution for model selection in dynamic environments, which is crucial for optimizing decision-making processes in various applications, including AI and data science.
Key Takeaways
- Introduces a Bayesian algorithm for online model selection in stochastic bandits.
- Achieves an oracle-style guarantee on Bayesian regret, enhancing decision-making strategies.
- Empirical validation shows competitive performance against existing base learners.
- Explores the impact of data sharing among learners to mitigate prior mis-specification.
- Addresses fundamental exploration challenges in adaptive learning environments.
Computer Science > Machine Learning arXiv:2602.17958 (cs) [Submitted on 20 Feb 2026] Title:Bayesian Online Model Selection Authors:Aida Afshar, Yuke Zhang, Aldo Pacchiano View a PDF of the paper titled Bayesian Online Model Selection, by Aida Afshar and 2 other authors View PDF HTML (experimental) Abstract:Online model selection in Bayesian bandits raises a fundamental exploration challenge: When an environment instance is sampled from a prior distribution, how can we design an adaptive strategy that explores multiple bandit learners and competes with the best one in hindsight? We address this problem by introducing a new Bayesian algorithm for online model selection in stochastic bandits. We prove an oracle-style guarantee of $O\left( d^* M \sqrt{T} + \sqrt{(MT)} \right)$ on the Bayesian regret, where $M$ is the number of base learners, $d^*$ is the regret coefficient of the optimal base learner, and $T$ is the time horizon. We also validate our method empirically across a range of stochastic bandit settings, demonstrating performance that is competitive with the best base learner. Additionally, we study the effect of sharing data among base learners and its role in mitigating prior mis-specification. Subjects: Machine Learning (cs.LG) Cite as: arXiv:2602.17958 [cs.LG] (or arXiv:2602.17958v1 [cs.LG] for this version) https://doi.org/10.48550/arXiv.2602.17958 Focus to learn more arXiv-issued DOI via DataCite (pending registration) Submission history From: Aida Afshar [...