[2410.05225] ETGL-DDPG: A Deep Deterministic Policy Gradient Algorithm for Sparse Reward Continuous Control
Summary
The paper introduces ETGL-DDPG, a novel deep deterministic policy gradient algorithm designed to enhance exploration in reinforcement learning with sparse rewards, demonstrating superior performance on standard benchmarks.
Why It Matters
This research addresses a significant challenge in reinforcement learning: effectively exploring environments with sparse rewards. By improving exploration strategies and experience replay mechanisms, the findings could lead to advancements in various applications, including robotics and AI systems that require efficient learning from limited feedback.
Key Takeaways
- ETGL-DDPG integrates three innovative techniques to enhance DDPG performance.
- The proposed $ ext{ε}t$-greedy search improves exploration in sparse reward environments.
- The dual experience replay buffer framework, GDRB, optimizes the use of rewarded transitions.
- Ablation studies confirm the individual contributions of each strategy to overall performance.
- ETGL-DDPG outperforms existing state-of-the-art methods in tested environments.
Computer Science > Machine Learning arXiv:2410.05225 (cs) [Submitted on 7 Oct 2024 (v1), last revised 17 Feb 2026 (this version, v3)] Title:ETGL-DDPG: A Deep Deterministic Policy Gradient Algorithm for Sparse Reward Continuous Control Authors:Ehsan Futuhi, Shayan Karimi, Chao Gao, Martin Müller View a PDF of the paper titled ETGL-DDPG: A Deep Deterministic Policy Gradient Algorithm for Sparse Reward Continuous Control, by Ehsan Futuhi and 3 other authors View PDF HTML (experimental) Abstract:We consider deep deterministic policy gradient (DDPG) in the context of reinforcement learning with sparse rewards. To enhance exploration, we introduce a search procedure, \emph{${\epsilon}{t}$-greedy}, which generates exploratory options for exploring less-visited states. We prove that search using $\epsilon t$-greedy has polynomial sample complexity under mild MDP assumptions. To more efficiently use the information provided by rewarded transitions, we develop a new dual experience replay buffer framework, \emph{GDRB}, and implement \emph{longest n-step returns}. The resulting algorithm, \emph{ETGL-DDPG}, integrates all three techniques: \bm{$\epsilon t$}-greedy, \textbf{G}DRB, and \textbf{L}ongest $n$-step, into DDPG. We evaluate ETGL-DDPG on standard benchmarks and demonstrate that it outperforms DDPG, as well as other state-of-the-art methods, across all tested sparse-reward continuous environments. Ablation studies further highlight how each strategy individually enhances the pe...