[2402.11877] Learning the Model While Learning Q: Finite-Time Sample

[2402.11877] Learning the Model While Learning Q: Finite-Time Sample Complexity of Online SyncMBQ

arXiv - Machine Learning March 31, 2026 3 min read

About this article

Abstract page for arXiv paper 2402.11877: Learning the Model While Learning Q: Finite-Time Sample Complexity of Online SyncMBQ

Computer Science > Machine Learning arXiv:2402.11877 (cs) [Submitted on 19 Feb 2024 (v1), last revised 30 Mar 2026 (this version, v2)] Title:Learning the Model While Learning Q: Finite-Time Sample Complexity of Online SyncMBQ Authors:Han-Dong Lim, HyeAnn Lee, Donghwan Lee View a PDF of the paper titled Learning the Model While Learning Q: Finite-Time Sample Complexity of Online SyncMBQ, by Han-Dong Lim and 2 other authors View PDF HTML (experimental) Abstract:Reinforcement learning has witnessed significant advancements, particularly with the emergence of model-based approaches. Among these, $Q$-learning has proven to be a powerful algorithm in model-free settings. However, the extension of $Q$-learning to a model-based framework remains relatively unexplored. In this paper, we investigate the sample complexity of $Q$-learning when integrated with a model-based approach. The proposed algorihtms learns both the model and Q-value in an online manner. We demonstrate a near-optimal sample complexity result within a broad range of step sizes. Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI) Cite as: arXiv:2402.11877 [cs.LG] (or arXiv:2402.11877v2 [cs.LG] for this version) https://doi.org/10.48550/arXiv.2402.11877 Focus to learn more arXiv-issued DOI via DataCite Submission history From: Han-Dong Lim [view email] [v1] Mon, 19 Feb 2024 06:33:51 UTC (787 KB) [v2] Mon, 30 Mar 2026 14:38:20 UTC (532 KB) Full-text links: Access Paper: View a PDF of the paper ti...

Originally published on March 31, 2026. Curated by AI News.

Machine Learning

How Dangerous Is Anthropic’s New AI Model? Its Chief Science Officer Explains.

Anthropic says Mythos is so dangerous that the company is slowing its release. We asked Jared Kaplan why.

AI Tools & Products · 3 min · 5 minutes ago

Llms

Built an political benchmark for LLMs. KIMI K2 can't answer about Taiwan (Obviously). GPT-5.3 refuses 100% of questions when given an opt-out. [P]

I spent the few days building a benchmark that maps where frontier LLMs fall on a 2D political compass (economic left/right + social prog...

Reddit - Machine Learning · 1 min · about 1 hour ago

Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min · about 1 hour ago

Machine Learning

Improving AI models’ ability to explain their predictions

AI News - General · 9 min · about 1 hour ago

[2402.11877] Learning the Model While Learning Q: Finite-Time Sample Complexity of Online SyncMBQ

About this article

Related Articles

How Dangerous Is Anthropic’s New AI Model? Its Chief Science Officer Explains.

Built an political benchmark for LLMs. KIMI K2 can't answer about Taiwan (Obviously). GPT-5.3 refuses 100% of questions when given an opt-out. [P]

UMKC Announces New Master of Science in Artificial Intelligence

Improving AI models’ ability to explain their predictions

No comments

Stay updated with AI News