[2510.18316] MoMaGen: Generating Demonstrations under Soft and Hard Constraints for Multi-Step Bimanual Mobile Manipulation
Summary
The paper presents MoMaGen, a novel approach for generating diverse datasets for multi-step bimanual mobile manipulation by addressing reachability and visibility constraints.
Why It Matters
As robotics increasingly relies on imitation learning from human demonstrations, MoMaGen's ability to generate diverse datasets efficiently is crucial. It addresses the complexities of mobile manipulation, which is vital for advancing robotic capabilities in real-world applications.
Key Takeaways
- MoMaGen formulates data generation as a constrained optimization problem, balancing hard and soft constraints.
- The method significantly increases dataset diversity for training imitation learning policies.
- MoMaGen allows for effective training with minimal real-world data, enhancing deployment feasibility.
- The approach generalizes across existing automated data generation frameworks.
- Evaluation on multiple tasks demonstrates improved performance over prior methods.
Computer Science > Robotics arXiv:2510.18316 (cs) [Submitted on 21 Oct 2025 (v1), last revised 22 Feb 2026 (this version, v2)] Title:MoMaGen: Generating Demonstrations under Soft and Hard Constraints for Multi-Step Bimanual Mobile Manipulation Authors:Chengshu Li, Mengdi Xu, Arpit Bahety, Hang Yin, Yunfan Jiang, Huang Huang, Josiah Wong, Sujay Garlanka, Cem Gokmen, Ruohan Zhang, Weiyu Liu, Jiajun Wu, Roberto Martín-Martín, Li Fei-Fei View a PDF of the paper titled MoMaGen: Generating Demonstrations under Soft and Hard Constraints for Multi-Step Bimanual Mobile Manipulation, by Chengshu Li and 13 other authors View PDF HTML (experimental) Abstract:Imitation learning from large-scale, diverse human demonstrations has been shown to be effective for training robots, but collecting such data is costly and time-consuming. This challenge intensifies for multi-step bimanual mobile manipulation, where humans must teleoperate both the mobile base and two high-DoF arms. Prior X-Gen works have developed automated data generation frameworks for static (bimanual) manipulation tasks, augmenting a few human demos in simulation with novel scene configurations to synthesize large-scale datasets. However, prior works fall short for bimanual mobile manipulation tasks for two major reasons: 1) a mobile base introduces the problem of how to place the robot base to enable downstream manipulation (reachability) and 2) an active camera introduces the problem of how to position the camera to genera...