[2601.07463] Puzzle it Out: Local-to-Global World Model for Offline Multi-Agent Reinforcement Learning
Summary
This paper presents a novel Local-to-Global (LOGO) world model for offline multi-agent reinforcement learning (MARL), improving policy generalization by leveraging local predictions to infer global dynamics and enhance synthetic data generation.
Why It Matters
The research addresses critical challenges in offline MARL, particularly the limitations of existing methods that often lead to conservative policies. By introducing a framework that enhances prediction accuracy and reduces computational overhead, this work has significant implications for advancing multi-agent systems and their applications in complex environments.
Key Takeaways
- Introduces a Local-to-Global (LOGO) world model for offline MARL.
- Enhances prediction accuracy by leveraging local predictions for global dynamics.
- Implements an uncertainty-aware sampling mechanism to improve policy learning.
- Demonstrates superior performance against state-of-the-art baselines in multiple scenarios.
- Reduces computational overhead compared to conventional ensemble methods.
Computer Science > Artificial Intelligence arXiv:2601.07463 (cs) [Submitted on 12 Jan 2026 (v1), last revised 19 Feb 2026 (this version, v2)] Title:Puzzle it Out: Local-to-Global World Model for Offline Multi-Agent Reinforcement Learning Authors:Sijia li, Xinran Li, Shibo Chen, Jun Zhang View a PDF of the paper titled Puzzle it Out: Local-to-Global World Model for Offline Multi-Agent Reinforcement Learning, by Sijia li and Xinran Li and Shibo Chen and Jun Zhang View PDF HTML (experimental) Abstract:Offline multi-agent reinforcement learning (MARL) aims to solve cooperative decision-making problems in multi-agent systems using pre-collected datasets. Existing offline MARL methods primarily constrain training within the dataset distribution, resulting in overly conservative policies that struggle to generalize beyond the support of the data. While model-based approaches offer a promising solution by expanding the original dataset with synthetic data generated from a learned world model, the high dimensionality, non-stationarity, and complexity of multi-agent systems make it challenging to accurately estimate the transitions and reward functions in offline MARL. Given the difficulty of directly modeling joint dynamics, we propose a local-to-global (LOGO) world model, a novel framework that leverages local predictions-which are easier to estimate-to infer global state dynamics, thus improving prediction accuracy while implicitly capturing agent-wise dependencies. Using the tra...