[2502.00204] Nearly-Optimal Bandit Learning in Stackelberg Games with Side Information

[2502.00204] Nearly-Optimal Bandit Learning in Stackelberg Games with Side Information

arXiv - Machine Learning 4 min read Article

Summary

This paper presents algorithms for nearly-optimal bandit learning in Stackelberg games, achieving improved regret rates and extending applications to auctions and Bayesian persuasion.

Why It Matters

The study enhances understanding of online learning in strategic environments, providing significant improvements in regret rates for leaders in Stackelberg games. This has implications for various fields, including economics and AI, where decision-making under uncertainty is crucial.

Key Takeaways

  • Introduces algorithms achieving O(T^{1/2}) regret in Stackelberg games.
  • Improvements over previous algorithms, which had O(T^{2/3}) regret rates.
  • Extends applications to second-price auctions and online Bayesian persuasion.
  • Relies on a reduction to linear contextual bandits for utility determination.
  • Empirical results show superior performance over earlier methods.

Computer Science > Machine Learning arXiv:2502.00204 (cs) [Submitted on 31 Jan 2025 (v1), last revised 19 Feb 2026 (this version, v2)] Title:Nearly-Optimal Bandit Learning in Stackelberg Games with Side Information Authors:Maria-Florina Balcan, Martino Bernasconi, Matteo Castiglioni, Andrea Celli, Keegan Harris, Zhiwei Steven Wu View a PDF of the paper titled Nearly-Optimal Bandit Learning in Stackelberg Games with Side Information, by Maria-Florina Balcan and 5 other authors View PDF HTML (experimental) Abstract:We study the problem of online learning in Stackelberg games with side information between a leader and a sequence of followers. In every round the leader observes contextual information and commits to a mixed strategy, after which the follower best-responds. We provide learning algorithms for the leader which achieve $O(T^{1/2})$ regret under bandit feedback, an improvement from the previously best-known rates of $O(T^{2/3})$. Our algorithms rely on a reduction to linear contextual bandits in the utility space: In each round, a linear contextual bandit algorithm recommends a utility vector, which our algorithm inverts to determine the leader's mixed strategy. We extend our algorithms to the setting in which the leader's utility function is unknown, and also apply it to the problems of bidding in second-price auctions with side information and online Bayesian persuasion with public and private states. Finally, we observe that our algorithms empirically outperform ...

Related Articles

As Meta Flounders, It Reportedly Plans to Open Source Its New AI Models
Machine Learning

As Meta Flounders, It Reportedly Plans to Open Source Its New AI Models

AI Tools & Products · 5 min ·
Google quietly launched an AI dictation app that works offline
Machine Learning

Google quietly launched an AI dictation app that works offline

TechCrunch - AI · 4 min ·
Llms

Why do the various LLM disappoint me in reading requests?

Serious question here. I have tried various LLM over the past year to help me choose fictional novels to read based on a decent amount of...

Reddit - Artificial Intelligence · 1 min ·
UMKC Announces New Master of Science in Artificial Intelligence
Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime