[2603.23926] Optimal Variance-Dependent Regret Bounds for

[2603.23926] Optimal Variance-Dependent Regret Bounds for Infinite-Horizon MDPs

arXiv - Machine Learning March 26, 2026 4 min read

About this article

Abstract page for arXiv paper 2603.23926: Optimal Variance-Dependent Regret Bounds for Infinite-Horizon MDPs

Computer Science > Machine Learning arXiv:2603.23926 (cs) [Submitted on 25 Mar 2026] Title:Optimal Variance-Dependent Regret Bounds for Infinite-Horizon MDPs Authors:Guy Zamir, Matthew Zurek, Yudong Chen View a PDF of the paper titled Optimal Variance-Dependent Regret Bounds for Infinite-Horizon MDPs, by Guy Zamir and 2 other authors View PDF HTML (experimental) Abstract:Online reinforcement learning in infinite-horizon Markov decision processes (MDPs) remains less theoretically and algorithmically developed than its episodic counterpart, with many algorithms suffering from high ``burn-in'' costs and failing to adapt to benign instance-specific complexity. In this work, we address these shortcomings for two infinite-horizon objectives: the classical average-reward regret and the $\gamma$-regret. We develop a single tractable UCB-style algorithm applicable to both settings, which achieves the first optimal variance-dependent regret guarantees. Our regret bounds in both settings take the form $\tilde{O}( \sqrt{SA\,\text{Var}} + \text{lower-order terms})$, where $S,A$ are the state and action space sizes, and $\text{Var}$ captures cumulative transition variance. This implies minimax-optimal average-reward and $\gamma$-regret bounds in the worst case but also adapts to easier problem instances, for example yielding nearly constant regret in deterministic MDPs. Furthermore, our algorithm enjoys significantly improved lower-order terms for the average-reward setting. With prior ...

Originally published on March 26, 2026. Curated by AI News.

Machine Learning

[R] First open-source implementation of Hebbian fast-weight write-back for the BDH architecture

The BDH (Dragon Hatchling) paper (arXiv:2509.26507) describes a Hebbian synaptic plasticity mechanism where model weights update during i...

Reddit - Machine Learning · 1 min · about 2 hours ago

Machine Learning

[D] Could really use some guidance . I'm a 2nd year Data Science UG Student

I'm currently finishing up my second year of a three year Bachelor of Data Science degree. I've got the basics down quite well, linear re...

Reddit - Machine Learning · 1 min · 1 day ago

Machine Learning

[P] Create datasets from TikTok videos

For ML experiments and RAG projects: Tikkocampus converts creator timelines into timestamped, searchable segments and then use it to perf...

Reddit - Machine Learning · 1 min · 1 day ago

Nlp

Memory chip giant SK hynix could help end 'RAMmageddon' with blockbuster US IPO | TechCrunch

SK hynix’s potential U.S. listing could raise $10-$14 billion to help it build more capacity, encourage others to follow, and end the 'RA...

TechCrunch - AI · 6 min · 1 day ago

[2603.23926] Optimal Variance-Dependent Regret Bounds for Infinite-Horizon MDPs

About this article

Related Articles

[R] First open-source implementation of Hebbian fast-weight write-back for the BDH architecture

[D] Could really use some guidance . I'm a 2nd year Data Science UG Student

[P] Create datasets from TikTok videos

Memory chip giant SK hynix could help end 'RAMmageddon' with blockbuster US IPO | TechCrunch

No comments

Stay updated with AI News