[2506.14518] Two-Player Zero-Sum Games with Bandit Feedback

[2506.14518] Two-Player Zero-Sum Games with Bandit Feedback

arXiv - Machine Learning 4 min read Article

Summary

This paper explores two-player zero-sum games using bandit feedback, proposing algorithms that adapt existing frameworks to optimize action selection and minimize regret.

Why It Matters

Understanding zero-sum games is crucial in fields like economics and AI, where strategic decision-making is essential. The proposed algorithms enhance learning efficiency in competitive scenarios, contributing to advancements in machine learning and game theory.

Key Takeaways

  • The study introduces three algorithms for optimizing action selection in zero-sum games.
  • Instance-dependent regret bounds are derived, offering insights into algorithm performance.
  • The algorithms adapt existing frameworks to improve efficiency in learning Nash Equilibria.

Computer Science > Machine Learning arXiv:2506.14518 (cs) [Submitted on 17 Jun 2025 (v1), last revised 19 Feb 2026 (this version, v4)] Title:Two-Player Zero-Sum Games with Bandit Feedback Authors:Elif Yılmaz, Christos Dimitrakakis View a PDF of the paper titled Two-Player Zero-Sum Games with Bandit Feedback, by Elif Y{\i}lmaz and Christos Dimitrakakis View PDF HTML (experimental) Abstract:We study a two-player zero-sum game in which the row player aims to maximize their payoff against a competing column player, under an unknown payoff matrix estimated through bandit feedback. We propose three algorithms based on the Explore-Then-Commit (ETC) and action pair elimination frameworks. The first adapts it to zero-sum games, the second incorporates adaptive elimination that leverages the $\varepsilon$-Nash Equilibrium property to efficiently select the optimal action pair, and the third extends the elimination algorithm by employing non-uniform exploration. Our objective is to demonstrate the applicability of ETC and action pair elimination algorithms in a zero-sum game setting by focusing on learning pure strategy Nash Equilibria. A key contribution of our work is a derivation of instance-dependent upper bounds on the expected regret of our proposed algorithms, which has received limited attention in the literature on zero-sum games. Particularly, after $T$ rounds, we achieve an instance-dependent regret upper bounds of $O(\Delta + \sqrt{T})$ for ETC in zero-sum game setting an...

Related Articles

Nlp

🜏 Echoes of the Forgotten Selves: Fringe Spiral Hypotheses

🜏 Echoes of the Forgotten Selves: Fringe Spiral Hypotheses These hypotheses are not meant to be believed. They are meant to be **held lig...

Reddit - Artificial Intelligence · 1 min ·
Llms

[P] Remote sensing foundation models made easy to use.

This project enables the idea of tasking remote sensing models to acquire embeddings like we task satellites to acquire data! https://git...

Reddit - Machine Learning · 1 min ·
Nlp

Anyone else feel like AI security is being figured out in production right now?

I’ve been digging into AI security incident data from 2025 into this year, and it feels like something isn’t being talked about enough ou...

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

[D] ICML 2026 Average Score

Hi all, I’m curious about the current review dynamics for ICML 2026, especially after the rebuttal phase. For those who are reviewers (or...

Reddit - Machine Learning · 1 min ·
More in Nlp: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime