[2502.08941] Analysis of Off-Policy $n$-Step TD-Learning with Linear Function Approximation
Summary
This paper analyzes off-policy n-step TD-learning algorithms with linear function approximation, demonstrating their convergence and fundamental properties in reinforcement learning.
Why It Matters
The findings contribute to the understanding of reinforcement learning, particularly in scenarios involving off-policy learning and linear function approximation. This research is crucial for developing more effective algorithms in machine learning applications.
Key Takeaways
- The paper proves convergence of n-step TD-learning algorithms as the sampling horizon increases.
- It examines the properties of model-based deterministic algorithms, providing insights into their model-free counterparts.
- Two new n-step TD-learning algorithms are proposed, enhancing the understanding of reinforcement learning dynamics.
Computer Science > Machine Learning arXiv:2502.08941 (cs) [Submitted on 13 Feb 2025 (v1), last revised 23 Feb 2026 (this version, v3)] Title:Analysis of Off-Policy $n$-Step TD-Learning with Linear Function Approximation Authors:Han-Dong Lim, Donghwan Lee View a PDF of the paper titled Analysis of Off-Policy $n$-Step TD-Learning with Linear Function Approximation, by Han-Dong Lim and 1 other authors View PDF HTML (experimental) Abstract:This paper analyzes multi-step temporal difference (TD)-learning algorithms within the ``deadly triad'' scenario, characterized by linear function approximation, off-policy learning, and bootstrapping. In particular, we prove that $n$-step TD-learning algorithms converge to a solution as the sampling horizon $n$ increases sufficiently. The paper is divided into two parts. In the first part, we comprehensively examine the fundamental properties of their model-based deterministic counterparts, including projected value iteration, gradient descent algorithms, which can be viewed as prototype deterministic algorithms whose analysis plays a pivotal role in understanding and developing their model-free reinforcement learning counterparts. In particular, we prove that these algorithms converge to meaningful solutions when $n$ is sufficiently large. Based on these findings, in the second part, two $n$-step TD-learning algorithms are proposed and analyzed, which can be seen as the model-free reinforcement learning counterparts of the model-based dete...