[2604.09378] BadSkill: Backdoor Attacks on Agent Skills via Model-in-Skill Poisoning
About this article
Abstract page for arXiv paper 2604.09378: BadSkill: Backdoor Attacks on Agent Skills via Model-in-Skill Poisoning
Computer Science > Cryptography and Security arXiv:2604.09378 (cs) [Submitted on 10 Apr 2026] Title:BadSkill: Backdoor Attacks on Agent Skills via Model-in-Skill Poisoning Authors:Guiyao Tie, Jiawen Shi, Pan Zhou, Lichao Sun View a PDF of the paper titled BadSkill: Backdoor Attacks on Agent Skills via Model-in-Skill Poisoning, by Guiyao Tie and 3 other authors View PDF HTML (experimental) Abstract:Agent ecosystems increasingly rely on installable skills to extend functionality, and some skills bundle learned model artifacts as part of their execution logic. This creates a supply-chain risk that is not captured by prompt injection or ordinary plugin misuse: a third-party skill may appear benign while concealing malicious behavior inside its bundled model. We present BadSkill, a backdoor attack formulation that targets this model-in-skill threat surface. In BadSkill, an adversary publishes a seemingly benign skill whose embedded model is backdoor-fine-tuned to activate a hidden payload only when routine skill parameters satisfy attacker-chosen semantic trigger combinations. To realize this attack, we train the embedded classifier with a composite objective that combines classification loss, margin-based separation, and poison-focused optimization, and evaluate it in an OpenClaw-inspired simulation environment that preserves third-party skill installation and execution while enabling controlled multi-model study. Our benchmark spans 13 skills, including 8 triggered tasks and ...