Machine Learning Ai Agents Data Science

[2502.00204] Nearly-Optimal Bandit Learning in Stackelberg Games with Side Information

arXiv - Machine Learning February 20, 2026 4 min read Article

Summary

This paper presents algorithms for nearly-optimal bandit learning in Stackelberg games, achieving improved regret rates and extending applications to auctions and Bayesian persuasion.

Why It Matters

The study enhances understanding of online learning in strategic environments, providing significant improvements in regret rates for leaders in Stackelberg games. This has implications for various fields, including economics and AI, where decision-making under uncertainty is crucial.

Key Takeaways

Introduces algorithms achieving O(T^{1/2}) regret in Stackelberg games.
Improvements over previous algorithms, which had O(T^{2/3}) regret rates.
Extends applications to second-price auctions and online Bayesian persuasion.
Relies on a reduction to linear contextual bandits for utility determination.
Empirical results show superior performance over earlier methods.

Computer Science > Machine Learning arXiv:2502.00204 (cs) [Submitted on 31 Jan 2025 (v1), last revised 19 Feb 2026 (this version, v2)] Title:Nearly-Optimal Bandit Learning in Stackelberg Games with Side Information Authors:Maria-Florina Balcan, Martino Bernasconi, Matteo Castiglioni, Andrea Celli, Keegan Harris, Zhiwei Steven Wu View a PDF of the paper titled Nearly-Optimal Bandit Learning in Stackelberg Games with Side Information, by Maria-Florina Balcan and 5 other authors View PDF HTML (experimental) Abstract:We study the problem of online learning in Stackelberg games with side information between a leader and a sequence of followers. In every round the leader observes contextual information and commits to a mixed strategy, after which the follower best-responds. We provide learning algorithms for the leader which achieve $O(T^{1/2})$ regret under bandit feedback, an improvement from the previously best-known rates of $O(T^{2/3})$. Our algorithms rely on a reduction to linear contextual bandits in the utility space: In each round, a linear contextual bandit algorithm recommends a utility vector, which our algorithm inverts to determine the leader's mixed strategy. We extend our algorithms to the setting in which the leader's utility function is unknown, and also apply it to the problems of bidding in second-price auctions with side information and online Bayesian persuasion with public and private states. Finally, we observe that our algorithms empirically outperform ...

Read Original Article