[2603.06977] NePPO: Near-Potential Policy Optimization for General-Sum Multi-Agent Reinforcement Learning
About this article
Abstract page for arXiv paper 2603.06977: NePPO: Near-Potential Policy Optimization for General-Sum Multi-Agent Reinforcement Learning
Computer Science > Machine Learning arXiv:2603.06977 (cs) [Submitted on 7 Mar 2026 (v1), last revised 4 Apr 2026 (this version, v2)] Title:NePPO: Near-Potential Policy Optimization for General-Sum Multi-Agent Reinforcement Learning Authors:Addison Kalanther, Sanika Bharvirkar, Shankar Sastry, Chinmay Maheshwari View a PDF of the paper titled NePPO: Near-Potential Policy Optimization for General-Sum Multi-Agent Reinforcement Learning, by Addison Kalanther and 3 other authors View PDF HTML (experimental) Abstract:Multi-agent reinforcement learning (MARL) is increasingly used to design learning-enabled agents that interact in shared environments. However, training MARL algorithms in general-sum games remains challenging: learning dynamics can become unstable, and convergence guarantees typically hold only in restricted settings such as two-player zero-sum or fully cooperative games. Moreover, when agents have heterogeneous and potentially conflicting preferences, it is unclear what system-level objective should guide learning. In this paper, we propose a new MARL pipeline called Near-Potential Policy Optimization (NePPO) for computing approximate Nash equilibria in mixed cooperative--competitive environments. The core idea is to learn a player-independent potential function such that the Nash equilibrium of a cooperative game with this potential as the common utility approximates a Nash equilibrium of the original game. To this end, we introduce a novel MARL objective such th...