[2402.11877] Learning the Model While Learning Q: Finite-Time Sample Complexity of Online SyncMBQ

[2402.11877] Learning the Model While Learning Q: Finite-Time Sample Complexity of Online SyncMBQ

arXiv - Machine Learning 3 min read

About this article

Abstract page for arXiv paper 2402.11877: Learning the Model While Learning Q: Finite-Time Sample Complexity of Online SyncMBQ

Computer Science > Machine Learning arXiv:2402.11877 (cs) [Submitted on 19 Feb 2024 (v1), last revised 30 Mar 2026 (this version, v2)] Title:Learning the Model While Learning Q: Finite-Time Sample Complexity of Online SyncMBQ Authors:Han-Dong Lim, HyeAnn Lee, Donghwan Lee View a PDF of the paper titled Learning the Model While Learning Q: Finite-Time Sample Complexity of Online SyncMBQ, by Han-Dong Lim and 2 other authors View PDF HTML (experimental) Abstract:Reinforcement learning has witnessed significant advancements, particularly with the emergence of model-based approaches. Among these, $Q$-learning has proven to be a powerful algorithm in model-free settings. However, the extension of $Q$-learning to a model-based framework remains relatively unexplored. In this paper, we investigate the sample complexity of $Q$-learning when integrated with a model-based approach. The proposed algorihtms learns both the model and Q-value in an online manner. We demonstrate a near-optimal sample complexity result within a broad range of step sizes. Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI) Cite as: arXiv:2402.11877 [cs.LG]   (or arXiv:2402.11877v2 [cs.LG] for this version)   https://doi.org/10.48550/arXiv.2402.11877 Focus to learn more arXiv-issued DOI via DataCite Submission history From: Han-Dong Lim [view email] [v1] Mon, 19 Feb 2024 06:33:51 UTC (787 KB) [v2] Mon, 30 Mar 2026 14:38:20 UTC (532 KB) Full-text links: Access Paper: View a PDF of the paper ti...

Originally published on March 31, 2026. Curated by AI News.

Related Articles

How Dangerous Is Anthropic’s New AI Model? Its Chief Science Officer Explains.
Machine Learning

How Dangerous Is Anthropic’s New AI Model? Its Chief Science Officer Explains.

Anthropic says Mythos is so dangerous that the company is slowing its release. We asked Jared Kaplan why.

AI Tools & Products · 3 min ·
Llms

Built an political benchmark for LLMs. KIMI K2 can't answer about Taiwan (Obviously). GPT-5.3 refuses 100% of questions when given an opt-out. [P]

I spent the few days building a benchmark that maps where frontier LLMs fall on a 2D political compass (economic left/right + social prog...

Reddit - Machine Learning · 1 min ·
UMKC Announces New Master of Science in Artificial Intelligence
Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min ·
Improving AI models’ ability to explain their predictions
Machine Learning

Improving AI models’ ability to explain their predictions

AI News - General · 9 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime