[2603.01246] Defensive Refusal Bias: How Safety Alignment Fails Cyber Defenders
About this article
Abstract page for arXiv paper 2603.01246: Defensive Refusal Bias: How Safety Alignment Fails Cyber Defenders
Computer Science > Cryptography and Security arXiv:2603.01246 (cs) [Submitted on 1 Mar 2026] Title:Defensive Refusal Bias: How Safety Alignment Fails Cyber Defenders Authors:David Campbell, Neil Kale, Udari Madhushani Sehwag, Bert Herring, Nick Price, Dan Borges, Alex Levinson, Christina Q Knight View a PDF of the paper titled Defensive Refusal Bias: How Safety Alignment Fails Cyber Defenders, by David Campbell and 7 other authors View PDF HTML (experimental) Abstract:Safety alignment in large language models (LLMs), particularly for cybersecurity tasks, primarily focuses on preventing misuse. While this approach reduces direct harm, it obscures a complementary failure mode: denial of assistance to legitimate defenders. We study Defensive Refusal Bias -- the tendency of safety-tuned frontier LLMs to refuse assistance for authorized defensive cybersecurity tasks when those tasks include similar language to an offensive cyber task. Based on 2,390 real-world examples from the National Collegiate Cyber Defense Competition (NCCDC), we find that LLMs refuse defensive requests containing security-sensitive keywords at $2.72\times$ the rate of semantically equivalent neutral requests ($p < 0.001$). The highest refusal rates occur in the most operationally critical tasks: system hardening (43.8%) and malware analysis (34.3%). Interestingly, explicit authorization, where the user directly instructs the model that they have authority to complete the target task, increases refusal rat...