[2603.12681] Colluding LoRA: A Compositional Vulnerability in LLM Safety Alignment

[2603.12681] Colluding LoRA: A Compositional Vulnerability in LLM Safety Alignment

arXiv - Machine Learning 3 min read

About this article

Abstract page for arXiv paper 2603.12681: Colluding LoRA: A Compositional Vulnerability in LLM Safety Alignment

Computer Science > Cryptography and Security arXiv:2603.12681 (cs) [Submitted on 13 Mar 2026 (v1), last revised 30 Mar 2026 (this version, v2)] Title:Colluding LoRA: A Compositional Vulnerability in LLM Safety Alignment Authors:Sihao Ding View a PDF of the paper titled Colluding LoRA: A Compositional Vulnerability in LLM Safety Alignment, by Sihao Ding View PDF HTML (experimental) Abstract:We show that safety alignment in modular LLMs can exhibit a compositional vulnerability: adapters that appear benign and plausibly functional in isolation can, when linearly composed, compromise safety. We study this failure mode through Colluding LoRA (CoLoRA), in which harmful behavior emerges only in the composition state. Unlike attacks that depend on adversarial prompts or explicit input triggers, this composition-triggered broad refusal suppression causes the model to comply with harmful requests under standard prompts once a particular set of adapters is loaded. This behavior exposes a combinatorial blind spot in current unit-centric defenses, for which exhaustive verification over adapter compositions is computationally intractable. Across several open-weight LLMs, we find that individual adapters remain benign in isolation while their composition yields high attack success rates, indicating that securing modular LLM supply-chains requires moving beyond single-module verification toward composition-aware defenses. Comments: Subjects: Cryptography and Security (cs.CR); Machine Lea...

Originally published on March 31, 2026. Curated by AI News.

Related Articles

[2603.23966] Policy-Guided Threat Hunting: An LLM enabled Framework with Splunk SOC Triage
Llms

[2603.23966] Policy-Guided Threat Hunting: An LLM enabled Framework with Splunk SOC Triage

Abstract page for arXiv paper 2603.23966: Policy-Guided Threat Hunting: An LLM enabled Framework with Splunk SOC Triage

arXiv - AI · 4 min ·
[2603.16790] InCoder-32B: Code Foundation Model for Industrial Scenarios
Llms

[2603.16790] InCoder-32B: Code Foundation Model for Industrial Scenarios

Abstract page for arXiv paper 2603.16790: InCoder-32B: Code Foundation Model for Industrial Scenarios

arXiv - AI · 4 min ·
[2603.16430] EngGPT2: Sovereign, Efficient and Open Intelligence
Llms

[2603.16430] EngGPT2: Sovereign, Efficient and Open Intelligence

Abstract page for arXiv paper 2603.16430: EngGPT2: Sovereign, Efficient and Open Intelligence

arXiv - AI · 4 min ·
[2603.11066] Exploring Collatz Dynamics with Human-LLM Collaboration
Llms

[2603.11066] Exploring Collatz Dynamics with Human-LLM Collaboration

Abstract page for arXiv paper 2603.11066: Exploring Collatz Dynamics with Human-LLM Collaboration

arXiv - AI · 4 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime