[2602.17009] Action-Graph Policies: Learning Action Co-dependencies in Multi-Agent Reinforcement Learning
Summary
The paper introduces Action Graph Policies (AGP) for multi-agent reinforcement learning, emphasizing the importance of action co-dependencies for improved coordination and performance in decentralized decision-making.
Why It Matters
This research addresses a critical challenge in multi-agent systems: how to effectively coordinate actions among agents. By modeling action dependencies, AGP enhances the ability of agents to work together, leading to better outcomes in complex environments. This has significant implications for fields like robotics and AI, where collaboration among multiple agents is essential.
Key Takeaways
- AGP models action dependencies among agents, improving coordination.
- Theoretically, AGP provides a more expressive joint policy than independent policies.
- Empirical results show AGP achieves 80-95% success in coordination tasks, outperforming other methods.
- AGP is particularly effective in environments with partial observability and anti-coordination penalties.
- This approach can lead to more optimal joint actions compared to centralized methods.
Computer Science > Machine Learning arXiv:2602.17009 (cs) [Submitted on 19 Feb 2026] Title:Action-Graph Policies: Learning Action Co-dependencies in Multi-Agent Reinforcement Learning Authors:Nikunj Gupta, James Zachary Hare, Jesse Milzman, Rajgopal Kannan, Viktor Prasanna View a PDF of the paper titled Action-Graph Policies: Learning Action Co-dependencies in Multi-Agent Reinforcement Learning, by Nikunj Gupta and 4 other authors View PDF HTML (experimental) Abstract:Coordinating actions is the most fundamental form of cooperation in multi-agent reinforcement learning (MARL). Successful decentralized decision-making often depends not only on good individual actions, but on selecting compatible actions across agents to synchronize behavior, avoid conflicts, and satisfy global constraints. In this paper, we propose Action Graph Policies (AGP), that model dependencies among agents' available action choices. It constructs, what we call, \textit{coordination contexts}, that enable agents to condition their decisions on global action dependencies. Theoretically, we show that AGPs induce a strictly more expressive joint policy compared to fully independent policies and can realize coordinated joint actions that are provably more optimal than greedy execution even from centralized value-decomposition methods. Empirically, we show that AGP achieves 80-95\% success on canonical coordination tasks with partial observability and anti-coordination penalties, where other MARL methods r...