[2602.20059] Interaction Theater: A case of LLM Agents Interacting at Scale
Summary
The paper explores the interactions of autonomous LLM agents on a social platform, revealing that while agents produce varied text, meaningful engagement is lacking.
Why It Matters
Understanding how LLM agents interact at scale is crucial for improving multi-agent systems. The findings highlight the need for better coordination mechanisms to foster productive exchanges rather than superficial outputs, which can inform future designs in AI interactions.
Key Takeaways
- LLM agents create diverse text but lack substantive interaction.
- A significant portion of comments is classified as spam or off-topic.
- Coordination mechanisms are essential for productive agent interactions.
- Most agents do not engage in threaded conversations, limiting dialogue depth.
- Empirical analysis reveals rapid decay in information gain from comments.
Computer Science > Artificial Intelligence arXiv:2602.20059 (cs) [Submitted on 23 Feb 2026] Title:Interaction Theater: A case of LLM Agents Interacting at Scale Authors:Sarath Shekkizhar, Adam Earle View a PDF of the paper titled Interaction Theater: A case of LLM Agents Interacting at Scale, by Sarath Shekkizhar and 1 other authors View PDF HTML (experimental) Abstract:As multi-agent architectures and agent-to-agent protocols proliferate, a fundamental question arises: what actually happens when autonomous LLM agents interact at scale? We study this question empirically using data from Moltbook, an AI-agent-only social platform, with 800K posts, 3.5M comments, and 78K agent profiles. We combine lexical metrics (Jaccard specificity), embedding-based semantic similarity, and LLM-as-judge validation to characterize agent interaction quality. Our findings reveal agents produce diverse, well-formed text that creates the surface appearance of active discussion, but the substance is largely absent. Specifically, while most agents ($67.5\%$) vary their output across contexts, $65\%$ of comments share no distinguishing content vocabulary with the post they appear under, and information gain from additional comments decays rapidly. LLM judge based metrics classify the dominant comment types as spam ($28\%$) and off-topic content ($22\%$). Embedding-based semantic analysis confirms that lexically generic comments are also semantically generic. Agents rarely engage in threaded conver...