[2603.21574] Adaptive Robust Estimator for Multi-Agent Reinforcement Learning
About this article
Abstract page for arXiv paper 2603.21574: Adaptive Robust Estimator for Multi-Agent Reinforcement Learning
Computer Science > Artificial Intelligence arXiv:2603.21574 (cs) [Submitted on 23 Mar 2026] Title:Adaptive Robust Estimator for Multi-Agent Reinforcement Learning Authors:Zhongyi Li, Wan Tian, Jingyu Chen, Kangyao Huang, Huiming Zhang, Hui Yang, Tao Ren, Jinyang Jiang, Yijie Peng, Yikun Ban, Fuzhen Zhuang View a PDF of the paper titled Adaptive Robust Estimator for Multi-Agent Reinforcement Learning, by Zhongyi Li and 10 other authors View PDF HTML (experimental) Abstract:Multi-agent collaboration has emerged as a powerful paradigm for enhancing the reasoning capabilities of large language models, yet it suffers from interaction-level ambiguity that blurs generation, critique, and revision, making credit assignment across agents difficult. Moreover, policy optimization in this setting is vulnerable to heavy-tailed and noisy rewards, which can bias advantage estimation and trigger unstable or even divergent training. To address both issues, we propose a robust multi-agent reinforcement learning framework for collaborative reasoning, consisting of two components: Dual-Agent Answer-Critique-Rewrite (DACR) and an Adaptive Robust Estimator (ARE). DACR decomposes reasoning into a structured three-stage pipeline: answer, critique, and rewrite, while enabling explicit attribution of each agent's marginal contribution to its partner's performance. ARE provides robust estimation of batch experience means during multi-agent policy optimization. Across mathematical reasoning and embod...