[2505.19698] Performance Asymmetry in Model-Based Reinforcement Learning

[2505.19698] Performance Asymmetry in Model-Based Reinforcement Learning

arXiv - Machine Learning 4 min read Article

Summary

The paper explores performance asymmetry in Model-Based Reinforcement Learning (MBRL), highlighting significant disparities in agent performance across different task types and proposing a novel model to address these issues.

Why It Matters

Understanding performance asymmetry in MBRL is crucial for developing more effective AI systems. This research reveals critical insights into how agents perform in varying contexts, which can inform future advancements in reinforcement learning techniques and applications.

Key Takeaways

  • MBRL shows super-human performance on average but struggles in specific tasks.
  • Performance asymmetry exists, with agents excelling in Agent-Optimal tasks but underperforming in Human-Optimal tasks.
  • A new aggregate measure, Sym-HNS, is proposed to better evaluate agent performance.
  • The JEDI world model improves performance across task types while enhancing computational efficiency.
  • Addressing performance asymmetry is vital for the future of reinforcement learning applications.

Computer Science > Machine Learning arXiv:2505.19698 (cs) [Submitted on 26 May 2025 (v1), last revised 24 Feb 2026 (this version, v3)] Title:Performance Asymmetry in Model-Based Reinforcement Learning Authors:Jing Yu Lim, Rushi Shah, Zarif Ikram, Samson Yu, Haozhe Ma, Tze-Yun Leong, Dianbo Liu View a PDF of the paper titled Performance Asymmetry in Model-Based Reinforcement Learning, by Jing Yu Lim and 6 other authors View PDF Abstract:Recently, Model-Based Reinforcement Learning (MBRL) have achieved super-human level performance on the Atari100k benchmark on average. However, we discover that conventional aggregates mask a major problem, Performance Asymmetry: MBRL agents dramatically outperform humans in certain tasks (Agent-Optimal tasks) while drastically underperform humans in other tasks (Human-Optimal tasks). Indeed, despite achieving SOTA in the overall mean Human-Normalized Scores (HNS), the SOTA agent scored the worst among baselines on Human-Optimal tasks, with a striking 21X performance gap between the Human-Optimal and Agent-Optimal subsets. To address this, we partition Atari100k evenly into Human-Optimal and Agent-Optimal subsets, and introduce a more balanced aggregate, Sym-HNS. Furthermore, we trace the striking Performance Asymmetry in the SOTA pixel diffusion world model to the curse of dimensionality and its prowess on high visual detail tasks (e.g. Breakout). To this end, we propose a novel latent end-to-end Joint Embedding DIffusion (JEDI) world model...

Related Articles

Machine Learning

[P] Unix philosophy for ML pipelines: modular, swappable stages with typed contracts

We built an open-source prototype that applies Unix philosophy to retrieval pipelines. Each stage (PII redaction, chunking, dedup, embedd...

Reddit - Machine Learning · 1 min ·
Machine Learning

Making an AI native sovereign computational stack

I’ve been working on a personal project that ended up becoming a kind of full computing stack: identity / trust protocol decentralized ch...

Reddit - Artificial Intelligence · 1 min ·
Llms

An attack class that passes every current LLM filter - no payload, no injection signature, no log trace

https://shapingrooms.com/research I published a paper today on something I've been calling postural manipulation. The short version: ordi...

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

What tools are sr MLEs using? (clawdbot, openspec, wispr) [D]

I'm already blasting cursor, but I want to level up my output. I heard that these kind of AI tools and workflows are being asked in SF. W...

Reddit - Machine Learning · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime