[2410.19450] Offline-to-Online Multi-Agent Reinforcement Learning with

[2410.19450] Offline-to-Online Multi-Agent Reinforcement Learning with Offline Value Function Memory and Sequential Exploration

arXiv - AI March 02, 2026 4 min read

About this article

Abstract page for arXiv paper 2410.19450: Offline-to-Online Multi-Agent Reinforcement Learning with Offline Value Function Memory and Sequential Exploration

Computer Science > Artificial Intelligence arXiv:2410.19450 (cs) [Submitted on 25 Oct 2024 (v1), last revised 27 Feb 2026 (this version, v2)] Title:Offline-to-Online Multi-Agent Reinforcement Learning with Offline Value Function Memory and Sequential Exploration Authors:Hai Zhong, Xun Wang, Zhuoran Li, Longbo Huang View a PDF of the paper titled Offline-to-Online Multi-Agent Reinforcement Learning with Offline Value Function Memory and Sequential Exploration, by Hai Zhong and 3 other authors View PDF HTML (experimental) Abstract:Offline-to-Online Reinforcement Learning has emerged as a powerful paradigm, leveraging offline data for initialization and online fine-tuning to enhance both sample efficiency and performance. However, most existing research has focused on single-agent settings, with limited exploration of the multi-agent extension, i.e., Offline-to-Online Multi-Agent Reinforcement Learning (O2O MARL). In O2O MARL, two critical challenges become more prominent as the number of agents increases: (i) the risk of unlearning pre-trained Q-values due to distributional shifts during the transition from offline-to-online phases, and (ii) the difficulty of efficient exploration in the large joint state-action space. To tackle these challenges, we propose a novel O2O MARL framework called Offline Value Function Memory with Sequential Exploration (OVMSE). First, we introduce the Offline Value Function Memory (OVM) mechanism to compute target Q-values, preserving knowledge g...

Originally published on March 02, 2026. Curated by AI News.

Machine Learning

[D] Got my first offer after months of searching — below posted range, contract-to-hire, and worried it may pause my search. Do I take it?

I could really use some outside perspective. I’m a senior ML/CV engineer in Canada with about 5–6 years across research and industry. Mas...

Reddit - Machine Learning · 1 min · 25 minutes ago

Machine Learning

[Research] AI training is bad, so I started an research

Hello, I started researching about AI training Q:Why? R: Because AI training is bad right now. Q: What do you mean its bad? R: Like when ...

Reddit - Machine Learning · 1 min · 25 minutes ago

Machine Learning

[P] Unix philosophy for ML pipelines: modular, swappable stages with typed contracts

We built an open-source prototype that applies Unix philosophy to retrieval pipelines. Each stage (PII redaction, chunking, dedup, embedd...

Reddit - Machine Learning · 1 min · about 3 hours ago

Machine Learning

Making an AI native sovereign computational stack

I’ve been working on a personal project that ended up becoming a kind of full computing stack: identity / trust protocol decentralized ch...

Reddit - Artificial Intelligence · 1 min · about 4 hours ago

[2410.19450] Offline-to-Online Multi-Agent Reinforcement Learning with Offline Value Function Memory and Sequential Exploration

About this article

Related Articles

[D] Got my first offer after months of searching — below posted range, contract-to-hire, and worried it may pause my search. Do I take it?

[Research] AI training is bad, so I started an research

[P] Unix philosophy for ML pipelines: modular, swappable stages with typed contracts

Making an AI native sovereign computational stack

No comments

Stay updated with AI News