[2209.04999] Multi-Step First: A Lightweight Deep Reinforcement Learning Strategy for Robust Continuous Control with Partial Observability
About this article
Abstract page for arXiv paper 2209.04999: Multi-Step First: A Lightweight Deep Reinforcement Learning Strategy for Robust Continuous Control with Partial Observability
Computer Science > Robotics arXiv:2209.04999 (cs) [Submitted on 12 Sep 2022 (v1), last revised 21 Mar 2026 (this version, v3)] Title:Multi-Step First: A Lightweight Deep Reinforcement Learning Strategy for Robust Continuous Control with Partial Observability Authors:Lingheng Meng, Rob Gorbet, Michael Burke, Dana Kulić View a PDF of the paper titled Multi-Step First: A Lightweight Deep Reinforcement Learning Strategy for Robust Continuous Control with Partial Observability, by Lingheng Meng and 3 other authors View PDF HTML (experimental) Abstract:Deep Reinforcement Learning (DRL) has made considerable advances in simulated and physical robot control tasks, especially when problems admit a fully observed Markov Decision Process (MDP) formulation. When observations only partially capture the underlying state, the problem becomes a Partially Observable MDP (POMDP), and performance rankings between algorithms can change. We empirically compare Proximal Policy Optimization (PPO), Twin Delayed Deep Deterministic Policy Gradient (TD3), and Soft Actor-Critic (SAC) on representative POMDP variants of continuous-control benchmarks. Contrary to widely reported MDP results where TD3 and SAC typically outperform PPO, we observe an inversion: PPO attains higher robustness under partial observability. We attribute this to the stabilizing effect of multi-step bootstrapping. Furthermore, incorporating multi-step targets into TD3 (MTD3) and SAC (MSAC) improves their robustness. These findin...