[2605.07073] TeamBench: Evaluating Agent Coordination under Enforced Role Separation
About this article
Abstract page for arXiv paper 2605.07073: TeamBench: Evaluating Agent Coordination under Enforced Role Separation
Computer Science > Artificial Intelligence arXiv:2605.07073 (cs) [Submitted on 8 May 2026] Title:TeamBench: Evaluating Agent Coordination under Enforced Role Separation Authors:Yubin Kim, Chanwoo Park, Taehan Kim, Eugene Park, Samuel Schmidgall, Salman Rahman, Chunjong Park, Cynthia Breazeal, Xin Liu, Hamid Palangi, Hae Won Park, Daniel McDuff View a PDF of the paper titled TeamBench: Evaluating Agent Coordination under Enforced Role Separation, by Yubin Kim and 11 other authors View PDF HTML (experimental) Abstract:Agent systems often decompose a task across multiple roles, but these roles are typically specified by prompts rather than enforced by access controls. Without enforcement, a team pass rate can mask whether agents actually coordinated or whether one role effectively did another role's work. We present TeamBench, a benchmark with 851 task templates and 931 seeded instances for evaluating agent coordination under operating system-enforced role separation. TeamBench separates specification access, workspace editing, and final certification across Planner, Executor, and Verifier roles, so that no role can read the full requirements, modify the workspace, and certify the final answer. Prompt-only and sandbox-enforced teams reach statistically indistinguishable pass rates, but prompt-only runs produce 3.6 times more cases where the verifier attempts to edit the executor's code. Verifiers approve 49% of submissions that fail the deterministic grader, and removing the ...