[2603.25464] Maximum Entropy Behavior Exploration for Sim2Real Zero-Shot Reinforcement Learning
About this article
Abstract page for arXiv paper 2603.25464: Maximum Entropy Behavior Exploration for Sim2Real Zero-Shot Reinforcement Learning
Computer Science > Machine Learning arXiv:2603.25464 (cs) [Submitted on 26 Mar 2026] Title:Maximum Entropy Behavior Exploration for Sim2Real Zero-Shot Reinforcement Learning Authors:Jiajun Hu, Nuria Armengol Urpi, Jin Cheng, Stelian Coros View a PDF of the paper titled Maximum Entropy Behavior Exploration for Sim2Real Zero-Shot Reinforcement Learning, by Jiajun Hu and 3 other authors View PDF HTML (experimental) Abstract:Zero-shot reinforcement learning (RL) algorithms aim to learn a family of policies from a reward-free dataset, and recover optimal policies for any reward function directly at test time. Naturally, the quality of the pretraining dataset determines the performance of the recovered policies across tasks. However, pre-collecting a relevant, diverse dataset without prior knowledge of the downstream tasks of interest remains a challenge. In this work, we study $\textit{online}$ zero-shot RL for quadrupedal control on real robotic systems, building upon the Forward-Backward (FB) algorithm. We observe that undirected exploration yields low-diversity data, leading to poor downstream performance and rendering policies impractical for direct hardware deployment. Therefore, we introduce FB-MEBE, an online zero-shot RL algorithm that combines an unsupervised behavior exploration strategy with a regularization critic. FB-MEBE promotes exploration by maximizing the entropy of the achieved behavior distribution. Additionally, a regularization critic shapes the recovered ...