[2505.04317] Mastering Multi-Drone Volleyball through Hierarchical Co-Self-Play Reinforcement Learning
Summary
This paper presents a novel approach to multi-drone volleyball using a hierarchical reinforcement learning framework, achieving high performance through strategic coordination and agile control.
Why It Matters
The research addresses complex challenges in multi-agent systems, showcasing advancements in reinforcement learning that can be applied to various robotics and AI applications. The findings could influence future developments in cooperative AI and robotics, particularly in dynamic environments.
Key Takeaways
- Introduces Hierarchical Co-Self-Play (HCSP) for multi-drone volleyball.
- Demonstrates superior performance with an average 82.9% win rate.
- Highlights the emergence of coordinated team behaviors through co-self-play.
- Separates high-level strategy from low-level control for effective learning.
- Provides a new training pipeline that does not require expert demonstrations.
Computer Science > Artificial Intelligence arXiv:2505.04317 (cs) [Submitted on 7 May 2025 (v1), last revised 26 Feb 2026 (this version, v5)] Title:Mastering Multi-Drone Volleyball through Hierarchical Co-Self-Play Reinforcement Learning Authors:Ruize Zhang, Sirui Xiang, Zelai Xu, Feng Gao, Shilong Ji, Wenhao Tang, Wenbo Ding, Chao Yu, Yu Wang View a PDF of the paper titled Mastering Multi-Drone Volleyball through Hierarchical Co-Self-Play Reinforcement Learning, by Ruize Zhang and 8 other authors View PDF HTML (experimental) Abstract:In this paper, we tackle the problem of learning to play 3v3 multi-drone volleyball, a new embodied competitive task that requires both high-level strategic coordination and low-level agile control. The task is turn-based, multi-agent, and physically grounded, posing significant challenges due to its long-horizon dependencies, tight inter-agent coupling, and the underactuated dynamics of quadrotors. To address this, we propose Hierarchical Co-Self-Play (HCSP), a hierarchical reinforcement learning framework that separates centralized high-level strategic decision-making from decentralized low-level motion control. We design a three-stage population-based training pipeline to enable both strategy and skill to emerge from scratch without expert demonstrations: (I) training diverse low-level skills, (II) learning high-level strategy via self-play with fixed low-level skills, and (III) joint fine-tuning through co-self-play. Experiments show that ...