[2603.21522] Efficient Failure Management for Multi-Agent Systems with Reasoning Trace Representation
About this article
Abstract page for arXiv paper 2603.21522: Efficient Failure Management for Multi-Agent Systems with Reasoning Trace Representation
Computer Science > Software Engineering arXiv:2603.21522 (cs) [Submitted on 23 Mar 2026] Title:Efficient Failure Management for Multi-Agent Systems with Reasoning Trace Representation Authors:Lingzhe Zhang, Tong Jia, Mingyu Wang, Weijie Hong, Chiming Duan, Minghua He, Rongqian Wang, Xi Peng, Meiling Wang, Gong Zhang, Renhai Chen, Ying Li View a PDF of the paper titled Efficient Failure Management for Multi-Agent Systems with Reasoning Trace Representation, by Lingzhe Zhang and 11 other authors View PDF HTML (experimental) Abstract:Large Language Models (LLM)-based Multi-Agent Systems (MASs) have emerged as a new paradigm in software system design, increasingly demonstrating strong reasoning and collaboration capabilities. As these systems become more complex and autonomous, effective failure management is essential to ensure reliability and availability. However, existing approaches often rely on per-trace reasoning, which leads to low efficiency, and neglect historical failure patterns, limiting diagnostic accuracy. In this paper, we conduct a preliminary empirical study to demonstrate the necessity, potential, and challenges of leveraging historical failure patterns to enhance failure management in MASs. Building on this insight, we propose \textbf{EAGER}, an efficient failure management framework for multi-agent systems based on reasoning trace representation. EAGER employs unsupervised reasoning-scoped contrastive learning to encode both intra-agent reasoning and inter...