[2603.23926] Optimal Variance-Dependent Regret Bounds for Infinite-Horizon MDPs
Nlp

[2603.23926] Optimal Variance-Dependent Regret Bounds for Infinite-Horizon MDPs

arXiv - Machine Learning 4 min read

About this article

Abstract page for arXiv paper 2603.23926: Optimal Variance-Dependent Regret Bounds for Infinite-Horizon MDPs

Computer Science > Machine Learning arXiv:2603.23926 (cs) [Submitted on 25 Mar 2026] Title:Optimal Variance-Dependent Regret Bounds for Infinite-Horizon MDPs Authors:Guy Zamir, Matthew Zurek, Yudong Chen View a PDF of the paper titled Optimal Variance-Dependent Regret Bounds for Infinite-Horizon MDPs, by Guy Zamir and 2 other authors View PDF HTML (experimental) Abstract:Online reinforcement learning in infinite-horizon Markov decision processes (MDPs) remains less theoretically and algorithmically developed than its episodic counterpart, with many algorithms suffering from high ``burn-in'' costs and failing to adapt to benign instance-specific complexity. In this work, we address these shortcomings for two infinite-horizon objectives: the classical average-reward regret and the $\gamma$-regret. We develop a single tractable UCB-style algorithm applicable to both settings, which achieves the first optimal variance-dependent regret guarantees. Our regret bounds in both settings take the form $\tilde{O}( \sqrt{SA\,\text{Var}} + \text{lower-order terms})$, where $S,A$ are the state and action space sizes, and $\text{Var}$ captures cumulative transition variance. This implies minimax-optimal average-reward and $\gamma$-regret bounds in the worst case but also adapts to easier problem instances, for example yielding nearly constant regret in deterministic MDPs. Furthermore, our algorithm enjoys significantly improved lower-order terms for the average-reward setting. With prior ...

Originally published on March 26, 2026. Curated by AI News.

Related Articles

Machine Learning

[R] First open-source implementation of Hebbian fast-weight write-back for the BDH architecture

The BDH (Dragon Hatchling) paper (arXiv:2509.26507) describes a Hebbian synaptic plasticity mechanism where model weights update during i...

Reddit - Machine Learning · 1 min ·
Machine Learning

[D] Could really use some guidance . I'm a 2nd year Data Science UG Student

I'm currently finishing up my second year of a three year Bachelor of Data Science degree. I've got the basics down quite well, linear re...

Reddit - Machine Learning · 1 min ·
Machine Learning

[P] Create datasets from TikTok videos

For ML experiments and RAG projects: Tikkocampus converts creator timelines into timestamped, searchable segments and then use it to perf...

Reddit - Machine Learning · 1 min ·
Memory chip giant SK hynix could help end 'RAMmageddon' with blockbuster US IPO | TechCrunch
Nlp

Memory chip giant SK hynix could help end 'RAMmageddon' with blockbuster US IPO | TechCrunch

SK hynix’s potential U.S. listing could raise $10-$14 billion to help it build more capacity, encourage others to follow, and end the 'RA...

TechCrunch - AI · 6 min ·
More in Nlp: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime