[2602.14857] World Models for Policy Refinement in StarCraft II
Summary
The paper presents StarWM, a novel world model for refining decision-making policies in StarCraft II using large language models, demonstrating significant improvements in predictive accuracy and gameplay performance.
Why It Matters
This research addresses the challenge of integrating predictive models into decision-making frameworks for complex environments like StarCraft II, showcasing advancements in AI capabilities. The findings could influence future AI applications in gaming and other real-world scenarios requiring strategic decision-making under uncertainty.
Key Takeaways
- StarWM introduces a world model that enhances decision-making in StarCraft II.
- The model predicts future observations under partial observability, improving policy refinement.
- StarWM shows nearly 60% improvement in resource prediction accuracy.
- The integrated decision system yields win-rate gains against SC2's AI.
- Structured textual representation aids in learning SC2's hybrid dynamics.
Computer Science > Artificial Intelligence arXiv:2602.14857 (cs) [Submitted on 16 Feb 2026] Title:World Models for Policy Refinement in StarCraft II Authors:Yixin Zhang, Ziyi Wang, Yiming Rong, Haoxi Wang, Jinling Jiang, Shuang Xu, Haoran Wu, Shiyu Zhou, Bo Xu View a PDF of the paper titled World Models for Policy Refinement in StarCraft II, by Yixin Zhang and 8 other authors View PDF HTML (experimental) Abstract:Large Language Models (LLMs) have recently shown strong reasoning and generalization capabilities, motivating their use as decision-making policies in complex environments. StarCraft II (SC2), with its massive state-action space and partial observability, is a challenging testbed. However, existing LLM-based SC2 agents primarily focus on improving the policy itself and overlook integrating a learnable, action-conditioned transition model into the decision loop. To bridge this gap, we propose StarWM, the first world model for SC2 that predicts future observations under partial observability. To facilitate learning SC2's hybrid dynamics, we introduce a structured textual representation that factorizes observations into five semantic modules, and construct SC2-Dynamics-50k, the first instruction-tuning dataset for SC2 dynamics prediction. We further develop a multi-dimensional offline evaluation framework for predicted structured observations. Offline results show StarWM's substantial gains over zero-shot baselines, including nearly 60% improvements in resource predi...