[2603.02604] Heterogeneous Agent Collaborative Reinforcement Learning
About this article
Abstract page for arXiv paper 2603.02604: Heterogeneous Agent Collaborative Reinforcement Learning
Computer Science > Machine Learning arXiv:2603.02604 (cs) [Submitted on 3 Mar 2026] Title:Heterogeneous Agent Collaborative Reinforcement Learning Authors:Zhixia Zhang, Zixuan Huang, Xin Xia, Deqing Wang, Fuzhen Zhuang, Shuai Ma, Ning Ding, Yaodong Yang, Jianxin Li, Yikun Ban View a PDF of the paper titled Heterogeneous Agent Collaborative Reinforcement Learning, by Zhixia Zhang and 9 other authors View PDF HTML (experimental) Abstract:We introduce Heterogeneous Agent Collaborative Reinforcement Learning (HACRL), a new learning paradigm that addresses the inefficiencies of isolated on-policy optimization. HACRL enables collaborative optimization with independent execution: heterogeneous agents share verified rollouts during training to mutually improve, while operating independently at inference time. Unlike LLM-based multi-agent reinforcement learning (MARL), HACRL does not require coordinated deployment, and unlike on-/off-policy distillation, it enables bidirectional mutual learning among heterogeneous agents rather than one-directional teacher-to-student transfer. Building on this paradigm, we propose HACPO, a collaborative RL algorithm that enables principled rollout sharing to maximize sample utilization and cross-agent knowledge transfer. To mitigate capability discrepancies and policy distribution shifts, HACPO introduces four tailored mechanisms with theoretical guarantees on unbiased advantage estimation and optimization correctness. Extensive experiments across ...