[2410.19450] Offline-to-Online Multi-Agent Reinforcement Learning with Offline Value Function Memory and Sequential Exploration

[2410.19450] Offline-to-Online Multi-Agent Reinforcement Learning with Offline Value Function Memory and Sequential Exploration

arXiv - AI 4 min read

About this article

Abstract page for arXiv paper 2410.19450: Offline-to-Online Multi-Agent Reinforcement Learning with Offline Value Function Memory and Sequential Exploration

Computer Science > Artificial Intelligence arXiv:2410.19450 (cs) [Submitted on 25 Oct 2024 (v1), last revised 27 Feb 2026 (this version, v2)] Title:Offline-to-Online Multi-Agent Reinforcement Learning with Offline Value Function Memory and Sequential Exploration Authors:Hai Zhong, Xun Wang, Zhuoran Li, Longbo Huang View a PDF of the paper titled Offline-to-Online Multi-Agent Reinforcement Learning with Offline Value Function Memory and Sequential Exploration, by Hai Zhong and 3 other authors View PDF HTML (experimental) Abstract:Offline-to-Online Reinforcement Learning has emerged as a powerful paradigm, leveraging offline data for initialization and online fine-tuning to enhance both sample efficiency and performance. However, most existing research has focused on single-agent settings, with limited exploration of the multi-agent extension, i.e., Offline-to-Online Multi-Agent Reinforcement Learning (O2O MARL). In O2O MARL, two critical challenges become more prominent as the number of agents increases: (i) the risk of unlearning pre-trained Q-values due to distributional shifts during the transition from offline-to-online phases, and (ii) the difficulty of efficient exploration in the large joint state-action space. To tackle these challenges, we propose a novel O2O MARL framework called Offline Value Function Memory with Sequential Exploration (OVMSE). First, we introduce the Offline Value Function Memory (OVM) mechanism to compute target Q-values, preserving knowledge g...

Originally published on March 02, 2026. Curated by AI News.

Related Articles

Machine Learning

[D] Got my first offer after months of searching — below posted range, contract-to-hire, and worried it may pause my search. Do I take it?

I could really use some outside perspective. I’m a senior ML/CV engineer in Canada with about 5–6 years across research and industry. Mas...

Reddit - Machine Learning · 1 min ·
Machine Learning

[Research] AI training is bad, so I started an research

Hello, I started researching about AI training Q:Why? R: Because AI training is bad right now. Q: What do you mean its bad? R: Like when ...

Reddit - Machine Learning · 1 min ·
Machine Learning

[P] Unix philosophy for ML pipelines: modular, swappable stages with typed contracts

We built an open-source prototype that applies Unix philosophy to retrieval pipelines. Each stage (PII redaction, chunking, dedup, embedd...

Reddit - Machine Learning · 1 min ·
Machine Learning

Making an AI native sovereign computational stack

I’ve been working on a personal project that ended up becoming a kind of full computing stack: identity / trust protocol decentralized ch...

Reddit - Artificial Intelligence · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime