[2604.00433] Internal State-Based Policy Gradient Methods for Partially Observable Markov Potential Games

[2604.00433] Internal State-Based Policy Gradient Methods for Partially Observable Markov Potential Games

arXiv - Machine Learning 3 min read

About this article

Abstract page for arXiv paper 2604.00433: Internal State-Based Policy Gradient Methods for Partially Observable Markov Potential Games

Computer Science > Multiagent Systems arXiv:2604.00433 (cs) [Submitted on 1 Apr 2026] Title:Internal State-Based Policy Gradient Methods for Partially Observable Markov Potential Games Authors:Wonseok Yang, Thinh T. Doan View a PDF of the paper titled Internal State-Based Policy Gradient Methods for Partially Observable Markov Potential Games, by Wonseok Yang and Thinh T. Doan View PDF HTML (experimental) Abstract:This letter studies multi-agent reinforcement learning in partially observable Markov potential games. Solving this problem is challenging due to partial observability, decentralized information, and the curse of dimensionality. First, to address the first two challenges, we leverage the common information framework, which allows agents to act based on both shared and local information. Second, to ensure tractability, we study an internal state that compresses accumulated information, preventing it from growing unboundedly over time. We then implement an internal state-based natural policy gradient method to find Nash equilibria of the Markov potential game. Our main contribution is to establish a non-asymptotic convergence bound for this method. Our theoretical bound decomposes into two interpretable components: a statistical error term that also arises in standard Markov potential games, and an approximation error capturing the use of finite-state controllers. Finally, simulations across multiple partially observable environments demonstrate that the proposed m...

Originally published on April 02, 2026. Curated by AI News.

Related Articles

Nlp

[D] Is ACL more about the benchmarks now?

I am not a NLP guy, but afaik ACL is one of the premium venues of NLP. And given that the results were announced recently, my LinkedIn an...

Reddit - Machine Learning · 1 min ·
[2604.01676] GPA: Learning GUI Process Automation from Demonstrations
Llms

[2604.01676] GPA: Learning GUI Process Automation from Demonstrations

Abstract page for arXiv paper 2604.01676: GPA: Learning GUI Process Automation from Demonstrations

arXiv - AI · 3 min ·
[2604.01413] Adaptive Stopping for Multi-Turn LLM Reasoning
Llms

[2604.01413] Adaptive Stopping for Multi-Turn LLM Reasoning

Abstract page for arXiv paper 2604.01413: Adaptive Stopping for Multi-Turn LLM Reasoning

arXiv - AI · 4 min ·
[2603.13777] Generate Then Correct: Single Shot Global Correction for Aspect Sentiment Quad Prediction
Nlp

[2603.13777] Generate Then Correct: Single Shot Global Correction for Aspect Sentiment Quad Prediction

Abstract page for arXiv paper 2603.13777: Generate Then Correct: Single Shot Global Correction for Aspect Sentiment Quad Prediction

arXiv - AI · 3 min ·
More in Nlp: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime