[2505.19698] Performance Asymmetry in Model-Based Reinforcement Learning
Summary
The paper explores performance asymmetry in Model-Based Reinforcement Learning (MBRL), highlighting significant disparities in agent performance across different task types and proposing a novel model to address these issues.
Why It Matters
Understanding performance asymmetry in MBRL is crucial for developing more effective AI systems. This research reveals critical insights into how agents perform in varying contexts, which can inform future advancements in reinforcement learning techniques and applications.
Key Takeaways
- MBRL shows super-human performance on average but struggles in specific tasks.
- Performance asymmetry exists, with agents excelling in Agent-Optimal tasks but underperforming in Human-Optimal tasks.
- A new aggregate measure, Sym-HNS, is proposed to better evaluate agent performance.
- The JEDI world model improves performance across task types while enhancing computational efficiency.
- Addressing performance asymmetry is vital for the future of reinforcement learning applications.
Computer Science > Machine Learning arXiv:2505.19698 (cs) [Submitted on 26 May 2025 (v1), last revised 24 Feb 2026 (this version, v3)] Title:Performance Asymmetry in Model-Based Reinforcement Learning Authors:Jing Yu Lim, Rushi Shah, Zarif Ikram, Samson Yu, Haozhe Ma, Tze-Yun Leong, Dianbo Liu View a PDF of the paper titled Performance Asymmetry in Model-Based Reinforcement Learning, by Jing Yu Lim and 6 other authors View PDF Abstract:Recently, Model-Based Reinforcement Learning (MBRL) have achieved super-human level performance on the Atari100k benchmark on average. However, we discover that conventional aggregates mask a major problem, Performance Asymmetry: MBRL agents dramatically outperform humans in certain tasks (Agent-Optimal tasks) while drastically underperform humans in other tasks (Human-Optimal tasks). Indeed, despite achieving SOTA in the overall mean Human-Normalized Scores (HNS), the SOTA agent scored the worst among baselines on Human-Optimal tasks, with a striking 21X performance gap between the Human-Optimal and Agent-Optimal subsets. To address this, we partition Atari100k evenly into Human-Optimal and Agent-Optimal subsets, and introduce a more balanced aggregate, Sym-HNS. Furthermore, we trace the striking Performance Asymmetry in the SOTA pixel diffusion world model to the curse of dimensionality and its prowess on high visual detail tasks (e.g. Breakout). To this end, we propose a novel latent end-to-end Joint Embedding DIffusion (JEDI) world model...