[2603.03768] Cognition to Control - Multi-Agent Learning for Human-Humanoid Collaborative Transport
About this article
Abstract page for arXiv paper 2603.03768: Cognition to Control - Multi-Agent Learning for Human-Humanoid Collaborative Transport
Computer Science > Robotics arXiv:2603.03768 (cs) [Submitted on 4 Mar 2026] Title:Cognition to Control - Multi-Agent Learning for Human-Humanoid Collaborative Transport Authors:Hao Zhang, Ding Zhao, H. Eric Tseng View a PDF of the paper titled Cognition to Control - Multi-Agent Learning for Human-Humanoid Collaborative Transport, by Hao Zhang and 2 other authors View PDF HTML (experimental) Abstract:Effective human-robot collaboration (HRC) requires translating high-level intent into contact-stable whole-body motion while continuously adapting to a human partner. Many vision-language-action (VLA) systems learn end-to-end mappings from observations and instructions to actions, but they often emphasize reactive (System 1-like) behavior and leave under-specified how sustained System 2-style deliberation can be integrated with reliable, low-latency continuous control. This gap is acute in multi-agent HRC, where long-horizon coordination decisions and physical execution must co-evolve under contact, feasibility, and safety constraints. We address this limitation with cognition-to-control (C2C), a three-layer hierarchy that makes the deliberation-to-control pathway explicit: (i) a VLM-based grounding layer that maintains persistent scene referents and infers embodiment-aware affordances/constraints; (ii) a deliberative skill/coordination layer-the System 2 core-that optimizes long-horizon skill choices and sequences under human-robot coupling via decentralized MARL cast as a Mar...