[2603.03741] HALyPO: Heterogeneous-Agent Lyapunov Policy Optimization for Human-Robot Collaboration
About this article
Abstract page for arXiv paper 2603.03741: HALyPO: Heterogeneous-Agent Lyapunov Policy Optimization for Human-Robot Collaboration
Computer Science > Robotics arXiv:2603.03741 (cs) [Submitted on 4 Mar 2026] Title:HALyPO: Heterogeneous-Agent Lyapunov Policy Optimization for Human-Robot Collaboration Authors:Hao Zhang, Yaru Niu, Yikai Wang, Ding Zhao, H. Eric Tseng View a PDF of the paper titled HALyPO: Heterogeneous-Agent Lyapunov Policy Optimization for Human-Robot Collaboration, by Hao Zhang and 4 other authors View PDF HTML (experimental) Abstract:To improve generalization and resilience in human-robot collaboration (HRC), robots must handle the combinatorial diversity of human behaviors and contexts, motivating multi-agent reinforcement learning (MARL). However, inherent heterogeneity between robots and humans creates a rationality gap (RG) in the learning process-a variational mismatch between decentralized best-response dynamics and centralized cooperative ascent. The resulting learning problem is a general-sum differentiable game, so independent policy-gradient updates can oscillate or diverge without added structure. We propose heterogeneous-agent Lyapunov policy optimization (HALyPO), which establishes formal stability directly in the policy-parameter space by enforcing a per-step Lyapunov decrease condition on a parameter-space disagreement metric. Unlike Lyapunov-based safe RL, which targets state/trajectory constraints in constrained Markov decision processes, HALyPO uses Lyapunov certification to stabilize decentralized policy learning. HALyPO rectifies decentralized gradients via optimal ...