[2501.16443] Object-Centric World Models from Few-Shot Annotations for Sample-Efficient Reinforcement Learning
Summary
The paper presents OC-STORM, an object-centric model-based reinforcement learning framework that enhances sample efficiency by leveraging few-shot annotations for tracking object dynamics in complex environments.
Why It Matters
This research addresses the critical challenge of sample inefficiency in reinforcement learning, particularly in real-world applications. By introducing an object-centric approach, it opens pathways for more effective learning in visually complex domains, potentially transforming how agents interact with their environments.
Key Takeaways
- OC-STORM improves sample efficiency in reinforcement learning.
- The framework utilizes few-shot annotations to track object dynamics.
- Empirical results show significant performance gains over existing methods.
- Object-centric representations enhance decision-making in complex scenes.
- The approach is particularly effective in visually challenging environments.
Computer Science > Machine Learning arXiv:2501.16443 (cs) [Submitted on 27 Jan 2025 (v1), last revised 25 Feb 2026 (this version, v2)] Title:Object-Centric World Models from Few-Shot Annotations for Sample-Efficient Reinforcement Learning Authors:Weipu Zhang, Adam Jelley, Trevor McInroe, Amos Storkey, Gang Wang View a PDF of the paper titled Object-Centric World Models from Few-Shot Annotations for Sample-Efficient Reinforcement Learning, by Weipu Zhang and 4 other authors View PDF HTML (experimental) Abstract:While deep reinforcement learning (RL) from pixels has achieved remarkable success, its sample inefficiency remains a critical limitation for real-world applications. Model-based RL (MBRL) addresses this by learning a world model to generate simulated experience, but standard approaches that rely on pixel-level reconstruction losses often fail to capture small, task-critical objects in complex, dynamic scenes. We posit that an object-centric (OC) representation can direct model capacity toward semantically meaningful entities, improving dynamics prediction and sample efficiency. In this work, we introduce OC-STORM, an object-centric MBRL framework that enhances a learned world model with object representations extracted by a pretrained segmentation network. By conditioning on a minimal number of annotated frames, OC-STORM learns to track decision-relevant object dynamics and inter-object interactions without extensive labeling or access to privileged information. Emp...