[2602.19400] Hilbert-Augmented Reinforcement Learning for Scalable Multi-Robot Coverage and Exploration
Summary
This paper presents a novel framework integrating Hilbert space-filling priors into decentralized multi-robot learning, enhancing coverage efficiency and scalability in sparse-reward environments.
Why It Matters
The research addresses critical challenges in multi-robot systems, particularly in improving exploration efficiency and reducing redundancy. By leveraging geometric priors, this approach could significantly enhance the autonomy and operational capabilities of robotic swarms and legged robots, making it relevant for both academic research and practical applications in robotics.
Key Takeaways
- Introduces a coverage framework that enhances multi-robot learning using Hilbert space-filling priors.
- Augments DQN and PPO algorithms to improve exploration and reduce redundancy in sparse environments.
- Demonstrates improved coverage efficiency and convergence speed compared to traditional methods.
- Validates the approach on a Boston Dynamics Spot robot, showcasing practical applicability.
- Highlights the potential for geometric priors to enhance scalability and autonomy in robotic systems.
Computer Science > Robotics arXiv:2602.19400 (cs) [Submitted on 23 Feb 2026] Title:Hilbert-Augmented Reinforcement Learning for Scalable Multi-Robot Coverage and Exploration Authors:Tamil Selvan Gurunathan, Aryya Gangopadhyay View a PDF of the paper titled Hilbert-Augmented Reinforcement Learning for Scalable Multi-Robot Coverage and Exploration, by Tamil Selvan Gurunathan and 1 other authors View PDF HTML (experimental) Abstract:We present a coverage framework that integrates Hilbert space-filling priors into decentralized multi-robot learning and execution. We augment DQN and PPO with Hilbert-based spatial indices to structure exploration and reduce redundancy in sparse-reward environments, and we evaluate scalability in multi-robot grid coverage. We further describe a waypoint interface that converts Hilbert orderings into curvature-bounded, time-parameterized SE(2) trajectories (planar (x, y, {\theta})), enabling onboard feasibility on resource-constrained robots. Experiments show improvements in coverage efficiency, redundancy, and convergence speed over DQN/PPO baselines. In addition, we validate the approach on a Boston Dynamics Spot legged robot, executing the generated trajectories in indoor environments and observing reliable coverage with low redundancy. These results indicate that geometric priors improve autonomy and scalability for swarm and legged robotics. Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA) Cite as: arXiv...