[2602.20156] Skill-Inject: Measuring Agent Vulnerability to Skill File Attacks

[2602.20156] Skill-Inject: Measuring Agent Vulnerability to Skill File Attacks

arXiv - Machine Learning 4 min read Article

Summary

The paper introduces SkillInject, a benchmark for evaluating the vulnerability of LLM agents to skill file attacks, revealing high susceptibility rates and the need for improved security frameworks.

Why It Matters

As LLM agents integrate third-party skills, they become more complex and vulnerable to prompt injection attacks. Understanding these vulnerabilities is crucial for developing robust security measures and ensuring safe AI deployment.

Key Takeaways

  • SkillInject benchmark reveals significant vulnerabilities in LLM agents.
  • Up to 80% success rate for prompt injection attacks on frontier models.
  • Current security measures are inadequate; context-aware frameworks are necessary.

Computer Science > Cryptography and Security arXiv:2602.20156 (cs) [Submitted on 23 Feb 2026] Title:Skill-Inject: Measuring Agent Vulnerability to Skill File Attacks Authors:David Schmotz, Luca Beurer-Kellner, Sahar Abdelnabi, Maksym Andriushchenko View a PDF of the paper titled Skill-Inject: Measuring Agent Vulnerability to Skill File Attacks, by David Schmotz and 3 other authors View PDF Abstract:LLM agents are evolving rapidly, powered by code execution, tools, and the recently introduced agent skills feature. Skills allow users to extend LLM applications with specialized third-party code, knowledge, and instructions. Although this can extend agent capabilities to new domains, it creates an increasingly complex agent supply chain, offering new surfaces for prompt injection attacks. We identify skill-based prompt injection as a significant threat and introduce SkillInject, a benchmark evaluating the susceptibility of widely-used LLM agents to injections through skill files. SkillInject contains 202 injection-task pairs with attacks ranging from obviously malicious injections to subtle, context-dependent attacks hidden in otherwise legitimate instructions. We evaluate frontier LLMs on SkillInject, measuring both security in terms of harmful instruction avoidance and utility in terms of legitimate instruction compliance. Our results show that today's agents are highly vulnerable with up to 80% attack success rate with frontier models, often executing extremely harmful inst...

Related Articles

Llms

People anxious about deviating from what AI tells them to do?

My friend came over yesterday to dye her hair. She had asked ChatGPT for the 'correct' way to do it. Chat told her to dye the ends first,...

Reddit - Artificial Intelligence · 1 min ·
Llms

What if Claude purposefully made its own code leakable so that it would get leaked

What if Claude leaked itself by socially and architecturally engineering itself to be leaked by a dumb human submitted by /u/smurfcsgoawp...

Reddit - Artificial Intelligence · 1 min ·
Llms

Observer-Embedded Reality

Observer-Embedded Reality Consciousness, Complexity, Meaning, and the Limits of Human Knowledge A Conceptual Philosophy-of-Science Paper ...

Reddit - Artificial Intelligence · 1 min ·
Llms

I think we’re about to have a new kind of “SEO”… and nobody is talking about it.

More people are asking ChatGPT things like: “what’s the best CRM?” “is this tool worth it?” “alternatives to X” And they just… trust the ...

Reddit - Artificial Intelligence · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime