[2509.09135] Continuous-Time Value Iteration for Multi-Agent Reinforcement Learning
Summary
This paper presents a Continuous-Time Multi-Agent Reinforcement Learning (CT-MARL) framework that enhances value iteration using physics-informed neural networks to address challenges in high-dimensional systems.
Why It Matters
The research is significant as it tackles the limitations of existing reinforcement learning methods in multi-agent scenarios, particularly in complex dynamical systems. By improving the scalability and accuracy of value function approximations, this work could advance applications in robotics, autonomous systems, and other fields requiring multi-agent coordination.
Key Takeaways
- CT-MARL framework utilizes physics-informed neural networks for value function approximation.
- Introduces Value Gradient Iteration (VGI) to improve policy training fidelity.
- Demonstrates superior performance on continuous-time benchmarks compared to existing methods.
Computer Science > Machine Learning arXiv:2509.09135 (cs) [Submitted on 11 Sep 2025 (v1), last revised 19 Feb 2026 (this version, v3)] Title:Continuous-Time Value Iteration for Multi-Agent Reinforcement Learning Authors:Xuefeng Wang, Lei Zhang, Henglin Pu, Ahmed H. Qureshi, Husheng Li View a PDF of the paper titled Continuous-Time Value Iteration for Multi-Agent Reinforcement Learning, by Xuefeng Wang and 4 other authors View PDF HTML (experimental) Abstract:Existing reinforcement learning (RL) methods struggle with complex dynamical systems that demand interactions at high frequencies or irregular time intervals. Continuous-time RL (CTRL) has emerged as a promising alternative by replacing discrete-time Bellman recursion with differential value functions defined as viscosity solutions of the Hamilton--Jacobi--Bellman (HJB) equation. While CTRL has shown promise, its applications have been largely limited to the single-agent domain. This limitation stems from two key challenges: (i) conventional solution methods for HJB equations suffer from the curse of dimensionality (CoD), making them intractable in high-dimensional systems; and (ii) even with HJB-based learning approaches, accurately approximating centralized value functions in multi-agent settings remains difficult, which in turn destabilizes policy training. In this paper, we propose a CT-MARL framework that uses physics-informed neural networks (PINNs) to approximate HJB-based value functions at scale. To ensure the...