[2603.12681] Colluding LoRA: A Compositional Vulnerability in LLM Safety Alignment
About this article
Abstract page for arXiv paper 2603.12681: Colluding LoRA: A Compositional Vulnerability in LLM Safety Alignment
Computer Science > Cryptography and Security arXiv:2603.12681 (cs) [Submitted on 13 Mar 2026 (v1), last revised 30 Mar 2026 (this version, v2)] Title:Colluding LoRA: A Compositional Vulnerability in LLM Safety Alignment Authors:Sihao Ding View a PDF of the paper titled Colluding LoRA: A Compositional Vulnerability in LLM Safety Alignment, by Sihao Ding View PDF HTML (experimental) Abstract:We show that safety alignment in modular LLMs can exhibit a compositional vulnerability: adapters that appear benign and plausibly functional in isolation can, when linearly composed, compromise safety. We study this failure mode through Colluding LoRA (CoLoRA), in which harmful behavior emerges only in the composition state. Unlike attacks that depend on adversarial prompts or explicit input triggers, this composition-triggered broad refusal suppression causes the model to comply with harmful requests under standard prompts once a particular set of adapters is loaded. This behavior exposes a combinatorial blind spot in current unit-centric defenses, for which exhaustive verification over adapter compositions is computationally intractable. Across several open-weight LLMs, we find that individual adapters remain benign in isolation while their composition yields high attack success rates, indicating that securing modular LLM supply-chains requires moving beyond single-module verification toward composition-aware defenses. Comments: Subjects: Cryptography and Security (cs.CR); Machine Lea...