[2602.18026] Mean-Field Reinforcement Learning without Synchrony
Summary
This paper presents a new framework for Mean-Field Reinforcement Learning (MF-RL) that addresses the challenges of asynchrony in multi-agent systems by utilizing a population distribution as a summary statistic.
Why It Matters
The study is significant as it extends the existing MF-RL theory to accommodate scenarios where agents may not act synchronously, which is common in real-world applications. By introducing the Temporal Mean Field (TMF) framework, it provides a more robust approach for analyzing and optimizing multi-agent interactions, potentially improving the efficiency of algorithms used in various AI applications.
Key Takeaways
- Introduces the Temporal Mean Field (TMF) framework for MF-RL.
- Addresses the issue of agent asynchrony in multi-agent systems.
- Proves existence and uniqueness of TMF equilibria.
- Demonstrates convergence of a new policy gradient algorithm (TMF-PG).
- Experimental results show TMF-PG performs consistently regardless of agent activity.
Computer Science > Multiagent Systems arXiv:2602.18026 (cs) [Submitted on 20 Feb 2026] Title:Mean-Field Reinforcement Learning without Synchrony Authors:Shan Yang View a PDF of the paper titled Mean-Field Reinforcement Learning without Synchrony, by Shan Yang View PDF HTML (experimental) Abstract:Mean-field reinforcement learning (MF-RL) scales multi-agent RL to large populations by reducing each agent's dependence on others to a single summary statistic -- the mean action. However, this reduction requires every agent to act at every time step; when some agents are idle, the mean action is simply undefined. Addressing asynchrony therefore requires a different summary statistic -- one that remains defined regardless of which agents act. The population distribution $\mu \in \Delta(\mathcal{O})$ -- the fraction of agents at each observation -- satisfies this requirement: its dimension is independent of $N$, and under exchangeability it fully determines each agent's reward and transition. Existing MF-RL theory, however, is built on the mean action and does not extend to $\mu$. We therefore construct the Temporal Mean Field (TMF) framework around the population distribution $\mu$ from scratch, covering the full spectrum from fully synchronous to purely sequential decision-making within a single theory. We prove existence and uniqueness of TMF equilibria, establish an $O(1/\sqrt{N})$ finite-population approximation bound that holds regardless of how many agents act per step, and...