[2511.14427] Self-Supervised Multisensory Pretraining for Contact-Rich Robot Reinforcement Learning
About this article
Abstract page for arXiv paper 2511.14427: Self-Supervised Multisensory Pretraining for Contact-Rich Robot Reinforcement Learning
Computer Science > Robotics arXiv:2511.14427 (cs) [Submitted on 18 Nov 2025 (v1), last revised 26 Mar 2026 (this version, v2)] Title:Self-Supervised Multisensory Pretraining for Contact-Rich Robot Reinforcement Learning Authors:Rickmer Krohn, Vignesh Prasad, Gabriele Tiboni, Georgia Chalvatzaki View a PDF of the paper titled Self-Supervised Multisensory Pretraining for Contact-Rich Robot Reinforcement Learning, by Rickmer Krohn and 2 other authors View PDF HTML (experimental) Abstract:Effective contact-rich manipulation requires robots to synergistically leverage vision, force, and proprioception. However, Reinforcement Learning agents struggle to learn in such multisensory settings, especially amidst sensory noise and dynamic changes. We propose MultiSensory Dynamic Pretraining (MSDP), a novel framework for learning expressive multisensory representations tailored for task-oriented policy learning. MSDP is based on masked autoencoding and trains a transformer-based encoder by reconstructing multisensory observations from only a subset of sensor embeddings, leading to cross-modal prediction and sensor fusion. For downstream policy learning, we introduce a novel asymmetric architecture, where a cross-attention mechanism allows the critic to extract dynamic, task-specific features from the frozen embeddings, while the actor receives a stable pooled representation to guide its actions. Our method demonstrates accelerated learning and robust performance under diverse perturbat...