[2508.06269] OM2P: Offline Multi-Agent Mean-Flow Policy
About this article
Abstract page for arXiv paper 2508.06269: OM2P: Offline Multi-Agent Mean-Flow Policy
Computer Science > Machine Learning arXiv:2508.06269 (cs) [Submitted on 8 Aug 2025 (v1), last revised 27 Feb 2026 (this version, v2)] Title:OM2P: Offline Multi-Agent Mean-Flow Policy Authors:Zhuoran Li, Xun Wang, Hai Zhong, Qingxin Xia, Lihua Zhang, Longbo Huang View a PDF of the paper titled OM2P: Offline Multi-Agent Mean-Flow Policy, by Zhuoran Li and 5 other authors View PDF HTML (experimental) Abstract:Generative models, especially diffusion and flow-based models, have been promising in offline multi-agent reinforcement learning. However, integrating powerful generative models into this framework poses unique challenges. In particular, diffusion and flow-based policies suffer from low sampling efficiency due to their iterative generation processes, making them impractical in time-sensitive or resource-constrained settings. To tackle these difficulties, we propose OM2P (Offline Multi-Agent Mean-Flow Policy), a novel offline MARL algorithm to achieve efficient one-step action sampling. To address the misalignment between generative objectives and reward maximization, we introduce a reward-aware optimization scheme that integrates a carefully-designed mean-flow matching loss with Q-function supervision. Additionally, we design a generalized timestep distribution and a derivative-free estimation strategy to reduce memory overhead and improve training stability. Empirical evaluations on Multi-Agent Particle and MuJoCo benchmarks demonstrate that OM2P achieves superior perfo...