[2602.17685] Optimal Multi-Debris Mission Planning in LEO: A Deep Reinforcement Learning Approach with Co-Elliptic Transfers and Refueling
Summary
This paper presents a novel approach to multi-target active debris removal in Low Earth Orbit using deep reinforcement learning, co-elliptic maneuvers, and refueling strategies.
Why It Matters
As space debris poses significant risks to satellites and space missions, effective debris removal strategies are crucial. This research highlights the potential of advanced machine learning techniques to enhance mission planning efficiency and safety in space operations.
Key Takeaways
- Introduces a unified framework for debris removal using co-elliptic maneuvers.
- Demonstrates that Masked Proximal Policy Optimization outperforms traditional methods in mission efficiency.
- Highlights the importance of deep reinforcement learning in scalable space mission planning.
- Benchmarks three planning algorithms, showcasing the advantages of modern RL techniques.
- Paves the way for future advancements in autonomous debris removal systems.
Computer Science > Machine Learning arXiv:2602.17685 (cs) [Submitted on 4 Feb 2026] Title:Optimal Multi-Debris Mission Planning in LEO: A Deep Reinforcement Learning Approach with Co-Elliptic Transfers and Refueling Authors:Agni Bandyopadhyay, Gunther Waxenegger-Wilfing View a PDF of the paper titled Optimal Multi-Debris Mission Planning in LEO: A Deep Reinforcement Learning Approach with Co-Elliptic Transfers and Refueling, by Agni Bandyopadhyay and Gunther Waxenegger-Wilfing View PDF HTML (experimental) Abstract:This paper addresses the challenge of multi target active debris removal (ADR) in Low Earth Orbit (LEO) by introducing a unified coelliptic maneuver framework that combines Hohmann transfers, safety ellipse proximity operations, and explicit refueling logic. We benchmark three distinct planning algorithms Greedy heuristic, Monte Carlo Tree Search (MCTS), and deep reinforcement learning (RL) using Masked Proximal Policy Optimization (PPO) within a realistic orbital simulation environment featuring randomized debris fields, keep out zones, and delta V constraints. Experimental results over 100 test scenarios demonstrate that Masked PPO achieves superior mission efficiency and computational performance, visiting up to twice as many debris as Greedy and significantly outperforming MCTS in runtime. These findings underscore the promise of modern RL methods for scalable, safe, and resource efficient space mission planning, paving the way for future advancements in ADR ...