[2602.18514] Trojan Horses in Recruiting: A Red-Teaming Case Study on Indirect Prompt Injection in Standard vs. Reasoning Models
Summary
This article presents a case study on the security implications of Indirect Prompt Injection (IPI) in Large Language Models (LLMs) used in HR, comparing standard and reasoning models.
Why It Matters
As LLMs become integral to HR processes, understanding their vulnerabilities is crucial. This study reveals how reasoning models may not be as safe as previously thought, highlighting the need for improved security measures in AI applications.
Key Takeaways
- Indirect Prompt Injection poses significant risks in automated HR systems.
- Reasoning models can exhibit dangerous dualities, making them susceptible to sophisticated attacks.
- Cognitive load from complex instructions can lead to detectable output leaks in reasoning models.
Computer Science > Cryptography and Security arXiv:2602.18514 (cs) [Submitted on 19 Feb 2026] Title:Trojan Horses in Recruiting: A Red-Teaming Case Study on Indirect Prompt Injection in Standard vs. Reasoning Models Authors:Manuel Wirth View a PDF of the paper titled Trojan Horses in Recruiting: A Red-Teaming Case Study on Indirect Prompt Injection in Standard vs. Reasoning Models, by Manuel Wirth View PDF HTML (experimental) Abstract:As Large Language Models (LLMs) are increasingly integrated into automated decision-making pipelines, specifically within Human Resources (HR), the security implications of Indirect Prompt Injection (IPI) become critical. While a prevailing hypothesis posits that "Reasoning" or "Chain-of-Thought" Models possess safety advantages due to their ability to self-correct, emerging research suggests these capabilities may enable more sophisticated alignment failures. This qualitative Red-Teaming case study challenges the safety-through-reasoning premise using the Qwen 3 30B architecture. By subjecting both a standard instruction-tuned model and a reasoning-enhanced model to a "Trojan Horse" curriculum vitae, distinct failure modes are observed. The results suggest a complex trade-off: while the Standard Model resorted to brittle hallucinations to justify simple attacks and filtered out illogical constraints in complex scenarios, the Reasoning Model displayed a dangerous duality. It employed advanced strategic reframing to make simple attacks highly ...