[2602.12636] Dual-Granularity Contrastive Reward via Generated Episodic Guidance for Efficient Embodied RL
Summary
This paper introduces the Dual-Granularity Contrastive Reward framework, which enhances sample efficiency in reinforcement learning (RL) for embodied tasks without requiring extensive human supervision.
Why It Matters
As reinforcement learning applications expand, particularly in robotics, the challenge of designing effective reward systems remains critical. This research addresses the limitations of existing methods by proposing a novel approach that reduces reliance on human-annotated data, potentially accelerating advancements in autonomous systems.
Key Takeaways
- The Dual-Granularity Contrastive Reward framework improves sample efficiency in RL.
- It utilizes generated episodic guidance from a limited number of expert videos.
- The framework balances coarse and fine-grained rewards to enhance agent training.
- Extensive experiments demonstrate its effectiveness across diverse tasks.
- This approach could lead to more autonomous and efficient robotic systems.
Computer Science > Machine Learning arXiv:2602.12636 (cs) [Submitted on 13 Feb 2026] Title:Dual-Granularity Contrastive Reward via Generated Episodic Guidance for Efficient Embodied RL Authors:Xin Liu, Yixuan Li, Yuhui Chen, Yuxing Qin, Haoran Li, Dongbin Zhao View a PDF of the paper titled Dual-Granularity Contrastive Reward via Generated Episodic Guidance for Efficient Embodied RL, by Xin Liu and 5 other authors View PDF Abstract:Designing suitable rewards poses a significant challenge in reinforcement learning (RL), especially for embodied manipulation. Trajectory success rewards are suitable for human judges or model fitting, but the sparsity severely limits RL sample efficiency. While recent methods have effectively improved RL via dense rewards, they rely heavily on high-quality human-annotated data or abundant expert supervision. To tackle these issues, this paper proposes Dual-granularity contrastive reward via generated Episodic Guidance (DEG), a novel framework to seek sample-efficient dense rewards without requiring human annotations or extensive supervision. Leveraging the prior knowledge of large video generation models, DEG only needs a small number of expert videos for domain adaptation to generate dedicated task guidance for each RL episode. Then, the proposed dual-granularity reward that balances coarse-grained exploration and fine-grained matching, will guide the agent to efficiently approximate the generated guidance video sequentially in the contrastive...