[2302.00797] Combining Tree-Search, Generative Models, and Nash

[2302.00797] Combining Tree-Search, Generative Models, and Nash Bargaining Concepts in Game-Theoretic Reinforcement Learning

arXiv - AI April 07, 2026 4 min read

About this article

Abstract page for arXiv paper 2302.00797: Combining Tree-Search, Generative Models, and Nash Bargaining Concepts in Game-Theoretic Reinforcement Learning

Computer Science > Artificial Intelligence arXiv:2302.00797 (cs) [Submitted on 1 Feb 2023 (v1), last revised 5 Apr 2026 (this version, v4)] Title:Combining Tree-Search, Generative Models, and Nash Bargaining Concepts in Game-Theoretic Reinforcement Learning Authors:Zun Li, Marc Lanctot, Kevin R. McKee, Luke Marris, Ian Gemp, Daniel Hennes, Paul Muller, Kate Larson, Yoram Bachrach, Michael P. Wellman View a PDF of the paper titled Combining Tree-Search, Generative Models, and Nash Bargaining Concepts in Game-Theoretic Reinforcement Learning, by Zun Li and 9 other authors View PDF HTML (experimental) Abstract:Opponent modeling methods typically involve two crucial steps: building a belief distribution over opponents' strategies, and exploiting this opponent model by playing a best response. However, existing approaches typically require domain-specific heurstics to come up with such a model, and algorithms for approximating best responses are hard to scale in large, imperfect information domains. In this work, we introduce a scalable and generic multiagent training regime for opponent modeling using deep game-theoretic reinforcement learning. We first propose Generative Best Respoonse (GenBR), a best response algorithm based on Monte-Carlo Tree Search (MCTS) with a learned deep generative model that samples world states during planning. This new method scales to large imperfect information domains and can be plug and play in a variety of multiagent algorithms. We use this ne...

Originally published on April 07, 2026. Curated by AI News.

Llms

Associative memory system for LLMs that learns during inference [P]

I've been working on MDA (Modular Dynamic Architecture), an online associative memory system for LLMs. Here's what I learned building it....

Reddit - Machine Learning · 1 min · 42 minutes ago

Machine Learning

A comedian’s strategy for poisoning AI training data

Apparently the best defense against AI copying your voice is strawberry mango forklift supersize fries. submitted by /u/bekircagricelik [...

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

Machine Learning

Bias in training data on display in weird way

So i was working on this Tabletop roleplaying game project and for my own amusement I told two different video generating ai models to ge...

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

Llms

Things I got wrong building a confidence evaluator for local LLMs [D]

I've been building **Autodidact**, a local-first AI agent framework. The central piece is a **confidence evaluator** - something that dec...

Reddit - Machine Learning · 1 min · about 2 hours ago

[2302.00797] Combining Tree-Search, Generative Models, and Nash Bargaining Concepts in Game-Theoretic Reinforcement Learning

About this article

Related Articles

Associative memory system for LLMs that learns during inference [P]

A comedian’s strategy for poisoning AI training data

Bias in training data on display in weird way

Things I got wrong building a confidence evaluator for local LLMs [D]

No comments

Stay updated with AI News