[2602.14543] Truly Adapting to Adversarial Constraints in Constrained MABs
Summary
This paper presents algorithms for the constrained multi-armed bandit (MAB) problem, addressing both stochastic and adversarial constraints while minimizing regret and constraint violation.
Why It Matters
The study is significant as it advances the understanding of constrained MABs in non-stationary environments, which is crucial for applications in machine learning where decision-making under uncertainty is common. The findings provide a framework for developing more efficient algorithms that can adapt to varying conditions, enhancing the robustness of machine learning models.
Key Takeaways
- Introduces algorithms that achieve optimal regret and positive constraint violation rates in constrained MABs.
- Addresses both full and bandit feedback scenarios, enhancing adaptability in non-stationary environments.
- Demonstrates that achieving both sublinear regret and violation is not possible, guiding future research directions.
Computer Science > Machine Learning arXiv:2602.14543 (cs) [Submitted on 16 Feb 2026] Title:Truly Adapting to Adversarial Constraints in Constrained MABs Authors:Francesco Emanuele Stradi, Kalana Kalupahana, Matteo Castiglioni, Alberto Marchesi, Nicola Gatti View a PDF of the paper titled Truly Adapting to Adversarial Constraints in Constrained MABs, by Francesco Emanuele Stradi and 4 other authors View PDF HTML (experimental) Abstract:We study the constrained variant of the \emph{multi-armed bandit} (MAB) problem, in which the learner aims not only at minimizing the total loss incurred during the learning dynamic, but also at controlling the violation of multiple \emph{unknown} constraints, under both \emph{full} and \emph{bandit feedback}. We consider a non-stationary environment that subsumes both stochastic and adversarial models and where, at each round, both losses and constraints are drawn from distributions that may change arbitrarily over time. In such a setting, it is provably not possible to guarantee both sublinear regret and sublinear violation. Accordingly, prior work has mainly focused either on settings with stochastic constraints or on relaxing the benchmark with fully adversarial constraints (\emph{e.g.}, via competitive ratios with respect to the optimum). We provide the first algorithms that achieve optimal rates of regret and \emph{positive} constraint violation when the constraints are stochastic while the losses may vary arbitrarily, and that simultan...