[2603.22846] CoMaTrack: Competitive Multi-Agent Game-Theoretic Tracking with Vision-Language-Action Models
About this article
Abstract page for arXiv paper 2603.22846: CoMaTrack: Competitive Multi-Agent Game-Theoretic Tracking with Vision-Language-Action Models
Computer Science > Artificial Intelligence arXiv:2603.22846 (cs) [Submitted on 24 Mar 2026] Title:CoMaTrack: Competitive Multi-Agent Game-Theoretic Tracking with Vision-Language-Action Models Authors:Youzhi Liu, Li Gao, Liu Liu, Mingyang Lv, Yang Cai View a PDF of the paper titled CoMaTrack: Competitive Multi-Agent Game-Theoretic Tracking with Vision-Language-Action Models, by Youzhi Liu and 4 other authors View PDF HTML (experimental) Abstract:Embodied Visual Tracking (EVT), a core dynamic task in embodied intelligence, requires an agent to precisely follow a language-specified target. Yet most existing methods rely on single-agent imitation learning, suffering from costly expert data and limited generalization due to static training environments. Inspired by competition-driven capability evolution, we propose CoMaTrack, a competitive game-theoretic multi-agent reinforcement learning framework that trains agents in a dynamic adversarial setting with competitive subtasks, yielding stronger adaptive planning and interference-resilient strategies. We further introduce CoMaTrack-Bench, the first benchmark for competitive EVT, featuring game scenarios between a tracker and adaptive opponents across diverse environments and instructions, enabling standardized robustness evaluation under active adversarial interactions. Experiments show that CoMaTrack achieves state-of-the-art results on both standard benchmarks and CoMaTrack-Bench. Notably, a 3B VLM trained with our framework s...