[2512.15163] MCP-SafetyBench: A Benchmark for Safety Evaluation of

[2512.15163] MCP-SafetyBench: A Benchmark for Safety Evaluation of Large Language Models with Real-World MCP Servers

arXiv - AI March 06, 2026 4 min read

About this article

Abstract page for arXiv paper 2512.15163: MCP-SafetyBench: A Benchmark for Safety Evaluation of Large Language Models with Real-World MCP Servers

Computer Science > Computation and Language arXiv:2512.15163 (cs) [Submitted on 17 Dec 2025 (v1), last revised 5 Mar 2026 (this version, v2)] Title:MCP-SafetyBench: A Benchmark for Safety Evaluation of Large Language Models with Real-World MCP Servers Authors:Xuanjun Zong, Zhiqi Shen, Lei Wang, Yunshi Lan, Chao Yang View a PDF of the paper titled MCP-SafetyBench: A Benchmark for Safety Evaluation of Large Language Models with Real-World MCP Servers, by Xuanjun Zong and 4 other authors View PDF HTML (experimental) Abstract:Large language models (LLMs) are evolving into agentic systems that reason, plan, and operate external tools. The Model Context Protocol (MCP) is a key enabler of this transition, offering a standardized interface for connecting LLMs with heterogeneous tools and services. Yet MCP's openness and multi-server workflows introduce new safety risks that existing benchmarks fail to capture, as they focus on isolated attacks or lack real-world coverage. We present MCP-SafetyBench, a comprehensive benchmark built on real MCP servers that supports realistic multi-turn evaluation across five domains: browser automation, financial analysis, location navigation, repository management, and web search. It incorporates a unified taxonomy of 20 MCP attack types spanning server, host, and user sides, and includes tasks requiring multi-step reasoning and cross-server coordination under uncertainty. Using MCP-SafetyBench, we systematically evaluate leading open- and closed-...

Originally published on March 06, 2026. Curated by AI News.

Llms

Bluesky’s new app is an AI for customizing your feed | The Verge

Eventually Attie will be able to vibe code entire apps for the AT Protocol.

The Verge - AI · 3 min · 10 minutes ago

Llms

Nicolas Carlini (67.2k citations on Google Scholar) says Claude is a better security researcher than him, made $3.7 million from exploiting smart contracts, and found vulnerabilities in Linux and Ghost

Link: https://m.youtube.com/watch?v=1sd26pWhfmg The Linux exploit is especially interesting because it was introduced in 2003 and was nev...

Reddit - Artificial Intelligence · 1 min · about 3 hours ago

Llms

[P] I built an autonomous ML agent that runs experiments on tabular data indefinitely - inspired by Karpathy's AutoResearch

Inspired by Andrej Karpathy's AutoResearch, I built a system where Claude Code acts as an autonomous ML researcher on tabular binary clas...

Reddit - Machine Learning · 1 min · about 3 hours ago

Llms

[R] BraiNN: An Experimental Neural Architecture with Working Memory, Relational Reasoning, and Adaptive Learning

BraiNN An Experimental Neural Architecture with Working Memory, Relational Reasoning, and Adaptive Learning BraiNN is a compact research‑...

Reddit - Machine Learning · 1 min · about 4 hours ago

[2512.15163] MCP-SafetyBench: A Benchmark for Safety Evaluation of Large Language Models with Real-World MCP Servers

About this article

Related Articles

Bluesky’s new app is an AI for customizing your feed | The Verge

Nicolas Carlini (67.2k citations on Google Scholar) says Claude is a better security researcher than him, made $3.7 million from exploiting smart contracts, and found vulnerabilities in Linux and Ghost

[P] I built an autonomous ML agent that runs experiments on tabular data indefinitely - inspired by Karpathy's AutoResearch

[R] BraiNN: An Experimental Neural Architecture with Working Memory, Relational Reasoning, and Adaptive Learning

No comments

Stay updated with AI News