[2508.11915] CORE: Measuring Multi-Agent LLM Interaction Quality under Game-Theoretic Pressures
Summary
The paper introduces CORE, a metric for evaluating language quality in multi-agent LLM interactions under game-theoretic conditions, revealing insights on linguistic adaptation.
Why It Matters
Understanding the quality of interactions among multi-agent systems is crucial for developing more effective AI communication strategies. CORE provides a quantifiable measure that can enhance the design of LLMs, influencing their application in various fields, from gaming to collaborative AI systems.
Key Takeaways
- CORE quantifies language use quality in multi-agent LLM interactions.
- Cooperative interactions show higher vocabulary growth and repetition compared to competitive ones.
- The metric integrates cluster entropy, lexical repetition, and semantic similarity for comprehensive analysis.
- Findings highlight the impact of social incentives on language adaptation in AI systems.
- CORE serves as a diagnostic tool for assessing linguistic robustness in AI interactions.
Computer Science > Computation and Language arXiv:2508.11915 (cs) [Submitted on 16 Aug 2025 (v1), last revised 22 Feb 2026 (this version, v2)] Title:CORE: Measuring Multi-Agent LLM Interaction Quality under Game-Theoretic Pressures Authors:Punya Syon Pandey, Yongjin Yang, Jiarui Liu, Zhijing Jin View a PDF of the paper titled CORE: Measuring Multi-Agent LLM Interaction Quality under Game-Theoretic Pressures, by Punya Syon Pandey and 3 other authors View PDF Abstract:Game-theoretic interactions between agents with Large Language Models (LLMs) have revealed many emergent capabilities, yet the linguistic diversity of these interactions has not been sufficiently quantified. In this paper, we present the Conversational Robustness Evaluation Score: CORE, a metric to quantify the effectiveness of language use within multi-agent systems across different game-theoretic interactions. CORE integrates measures of cluster entropy, lexical repetition, and semantic similarity, providing a direct lens of dialog quality. We apply CORE to pairwise LLM dialogs across competitive, cooperative, and neutral settings, further grounding our analysis in Zipf's and Heaps' Laws to characterize word frequency distributions and vocabulary growth. Our findings show that cooperative settings exhibit both steeper Zipf distributions and higher Heap exponents, indicating more repetition alongside greater vocabulary expansion. In contrast, competitive interactions display lower Zipf and Heaps exponents, ref...