[2602.12520] Multi-Agent Model-Based Reinforcement Learning with Joint State-Action Learned Embeddings
Summary
This paper presents a novel framework for multi-agent model-based reinforcement learning, integrating joint state-action representation learning with imaginative roll-outs to enhance coordination among agents in dynamic environments.
Why It Matters
The research addresses the complexities of coordinating multiple agents in environments where information is limited and dynamics are unpredictable. By improving representation learning and planning, this framework has the potential to advance applications in robotics, gaming, and other fields reliant on multi-agent systems.
Key Takeaways
- Introduces a model-based framework that combines joint state-action representation with imaginative roll-outs.
- Utilizes variational auto-encoders to enhance the world model for better data efficiency.
- Demonstrates improved long-term planning through empirical studies on established multi-agent benchmarks.
- Highlights the effectiveness of joint state-action learned embeddings in optimizing agent interactions.
- Provides a foundation for future research in multi-agent systems and reinforcement learning.
Computer Science > Machine Learning arXiv:2602.12520 (cs) [Submitted on 13 Feb 2026] Title:Multi-Agent Model-Based Reinforcement Learning with Joint State-Action Learned Embeddings Authors:Zhizun Wang, David Meger View a PDF of the paper titled Multi-Agent Model-Based Reinforcement Learning with Joint State-Action Learned Embeddings, by Zhizun Wang and 1 other authors View PDF HTML (experimental) Abstract:Learning to coordinate many agents in partially observable and highly dynamic environments requires both informative representations and data-efficient training. To address this challenge, we present a novel model-based multi-agent reinforcement learning framework that unifies joint state-action representation learning with imaginative roll-outs. We design a world model trained with variational auto-encoders and augment the model using the state-action learned embedding (SALE). SALE is injected into both the imagination module that forecasts plausible future roll-outs and the joint agent network whose individual action values are combined through a mixing network to estimate the joint action-value function. By coupling imagined trajectories with SALE-based action values, the agents acquire a richer understanding of how their choices influence collective outcomes, leading to improved long-term planning and optimization under limited real-environment interactions. Empirical studies on well-established multi-agent benchmarks, including StarCraft II Micro-Management, Multi-Ag...