[2604.04983] Territory Paint Wars: Diagnosing and Mitigating Failure Modes in Competitive Multi-Agent PPO
About this article
Abstract page for arXiv paper 2604.04983: Territory Paint Wars: Diagnosing and Mitigating Failure Modes in Competitive Multi-Agent PPO
Computer Science > Machine Learning arXiv:2604.04983 (cs) [Submitted on 4 Apr 2026] Title:Territory Paint Wars: Diagnosing and Mitigating Failure Modes in Competitive Multi-Agent PPO Authors:Diyansha Singh View a PDF of the paper titled Territory Paint Wars: Diagnosing and Mitigating Failure Modes in Competitive Multi-Agent PPO, by Diyansha Singh View PDF HTML (experimental) Abstract:We present Territory Paint Wars, a minimal competitive multi-agent reinforcement learning environment implemented in Unity, and use it to systematically investigate failure modes of Proximal Policy Optimisation (PPO) under self-play. A first agent trained for $84{,}000$ episodes achieves only $26.8\%$ win rate against a uniformly-random opponent in a symmetric zero-sum game. Through controlled ablations we identify five implementation-level failure modes -- reward-scale imbalance, missing terminal signal, ineffective long-horizon credit assignment, unnormalised observations, and incorrect win detection -- each of which contributes critically to this failure in this setting. After correcting these issues, we uncover a distinct emergent pathology: competitive overfitting, where co-adapting agents maintain stable self-play performance while generalisation win rate collapses from $73.5\%$ to $21.6\%$. Critically, this failure is undetectable via standard self-play metrics: both agents co-adapt equally, so the self-play win rate remains near $50\%$ throughout the collapse. We propose a minimal inte...