[2602.16629] Almost Sure Convergence of Differential Temporal Difference Learning for Average Reward Markov Decision Processes

[2602.16629] Almost Sure Convergence of Differential Temporal Difference Learning for Average Reward Markov Decision Processes

arXiv - AI 3 min read Article

Summary

This paper presents the almost sure convergence of differential temporal difference learning for average reward Markov decision processes, addressing limitations in existing convergence guarantees.

Why It Matters

Understanding the convergence of differential temporal difference learning is crucial for practitioners in reinforcement learning, as it enhances the theoretical foundation and practical application of these algorithms without reliance on local clocks, making them more accessible for real-world scenarios.

Key Takeaways

  • Proves almost sure convergence of on-policy n-step differential TD learning without local clocks.
  • Establishes sufficient conditions for off-policy n-step differential TD convergence.
  • Strengthens theoretical foundations of differential TD learning for average reward scenarios.

Computer Science > Machine Learning arXiv:2602.16629 (cs) [Submitted on 18 Feb 2026] Title:Almost Sure Convergence of Differential Temporal Difference Learning for Average Reward Markov Decision Processes Authors:Ethan Blaser, Jiuqi Wang, Shangtong Zhang View a PDF of the paper titled Almost Sure Convergence of Differential Temporal Difference Learning for Average Reward Markov Decision Processes, by Ethan Blaser and 2 other authors View PDF Abstract:The average reward is a fundamental performance metric in reinforcement learning (RL) focusing on the long-run performance of an agent. Differential temporal difference (TD) learning algorithms are a major advance for average reward RL as they provide an efficient online method to learn the value functions associated with the average reward in both on-policy and off-policy settings. However, existing convergence guarantees require a local clock in learning rates tied to state visit counts, which practitioners do not use and does not extend beyond tabular settings. We address this limitation by proving the almost sure convergence of on-policy $n$-step differential TD for any $n$ using standard diminishing learning rates without a local clock. We then derive three sufficient conditions under which off-policy $n$-step differential TD also converges without a local clock. These results strengthen the theoretical foundations of differential TD and bring its convergence analysis closer to practical implementations. Subjects: Machine...

Related Articles

Generative Ai

Midjourney has a new offer on the cancel page there is 20 off for 2 months

submitted by /u/RainDragonfly826 [link] [comments]

Reddit - Artificial Intelligence · 1 min ·
Walmart CEO reportedly brags that company's in-app AI agent is making people spend 35% more money
Nlp

Walmart CEO reportedly brags that company's in-app AI agent is making people spend 35% more money

AI Tools & Products · 4 min ·
Llms

[R] Looking for arXiv cs.LG endorser, inference monitoring using information geometry

Hi r/MachineLearning, I’m looking for an arXiv endorser in cs.LG for a paper on inference-time distribution shift detection for deployed ...

Reddit - Machine Learning · 1 min ·
Nlp

[D] KDD Review Discussion

KDD 2026 (Feb Cycle) reviews will release today (4-April AoE), This thread is open to discuss about reviews and importantly celebrate suc...

Reddit - Machine Learning · 1 min ·
More in Nlp: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime