[2602.18291] Diffusing to Coordinate: Efficient Online Multi-Agent Diffusion Policies
Summary
This paper introduces OMAD, an innovative Online Multi-Agent Reinforcement Learning framework utilizing diffusion policies to enhance coordination and sample efficiency across diverse tasks.
Why It Matters
The research addresses a significant gap in the application of diffusion models within online multi-agent reinforcement learning, presenting a novel approach that improves coordination and exploration. This advancement could lead to more efficient algorithms in AI applications, making it relevant for researchers and practitioners in the field.
Key Takeaways
- OMAD framework enhances agent coordination using diffusion policies.
- Introduces a relaxed policy objective to maximize joint entropy.
- Demonstrates state-of-the-art performance with 2.5x to 5x sample efficiency improvement.
- Utilizes centralized training with decentralized execution for stability.
- Addresses the challenges of intractable likelihoods in diffusion models.
Computer Science > Artificial Intelligence arXiv:2602.18291 (cs) [Submitted on 20 Feb 2026] Title:Diffusing to Coordinate: Efficient Online Multi-Agent Diffusion Policies Authors:Zhuoran Li, Hai Zhong, Xun Wang, Qingxin Xia, Lihua Zhang, Longbo Huang View a PDF of the paper titled Diffusing to Coordinate: Efficient Online Multi-Agent Diffusion Policies, by Zhuoran Li and 5 other authors View PDF HTML (experimental) Abstract:Online Multi-Agent Reinforcement Learning (MARL) is a prominent framework for efficient agent coordination. Crucially, enhancing policy expressiveness is pivotal for achieving superior performance. Diffusion-based generative models are well-positioned to meet this demand, having demonstrated remarkable expressiveness and multimodal representation in image generation and offline settings. Yet, their potential in online MARL remains largely under-explored. A major obstacle is that the intractable likelihoods of diffusion models impede entropy-based exploration and coordination. To tackle this challenge, we propose among the first \underline{O}nline off-policy \underline{MA}RL framework using \underline{D}iffusion policies (\textbf{OMAD}) to orchestrate coordination. Our key innovation is a relaxed policy objective that maximizes scaled joint entropy, facilitating effective exploration without relying on tractable likelihood. Complementing this, within the centralized training with decentralized execution (CTDE) paradigm, we employ a joint distributional v...