[2512.15163] MCP-SafetyBench: A Benchmark for Safety Evaluation of Large Language Models with Real-World MCP Servers
About this article
Abstract page for arXiv paper 2512.15163: MCP-SafetyBench: A Benchmark for Safety Evaluation of Large Language Models with Real-World MCP Servers
Computer Science > Computation and Language arXiv:2512.15163 (cs) [Submitted on 17 Dec 2025 (v1), last revised 5 Mar 2026 (this version, v2)] Title:MCP-SafetyBench: A Benchmark for Safety Evaluation of Large Language Models with Real-World MCP Servers Authors:Xuanjun Zong, Zhiqi Shen, Lei Wang, Yunshi Lan, Chao Yang View a PDF of the paper titled MCP-SafetyBench: A Benchmark for Safety Evaluation of Large Language Models with Real-World MCP Servers, by Xuanjun Zong and 4 other authors View PDF HTML (experimental) Abstract:Large language models (LLMs) are evolving into agentic systems that reason, plan, and operate external tools. The Model Context Protocol (MCP) is a key enabler of this transition, offering a standardized interface for connecting LLMs with heterogeneous tools and services. Yet MCP's openness and multi-server workflows introduce new safety risks that existing benchmarks fail to capture, as they focus on isolated attacks or lack real-world coverage. We present MCP-SafetyBench, a comprehensive benchmark built on real MCP servers that supports realistic multi-turn evaluation across five domains: browser automation, financial analysis, location navigation, repository management, and web search. It incorporates a unified taxonomy of 20 MCP attack types spanning server, host, and user sides, and includes tasks requiring multi-step reasoning and cross-server coordination under uncertainty. Using MCP-SafetyBench, we systematically evaluate leading open- and closed-...