Llms Machine Learning Ai Safety Generative Ai

[2602.18514] Trojan Horses in Recruiting: A Red-Teaming Case Study on Indirect Prompt Injection in Standard vs. Reasoning Models

arXiv - AI February 24, 2026 4 min read Article

Summary

This article presents a case study on the security implications of Indirect Prompt Injection (IPI) in Large Language Models (LLMs) used in HR, comparing standard and reasoning models.

Why It Matters

As LLMs become integral to HR processes, understanding their vulnerabilities is crucial. This study reveals how reasoning models may not be as safe as previously thought, highlighting the need for improved security measures in AI applications.

Key Takeaways

Indirect Prompt Injection poses significant risks in automated HR systems.
Reasoning models can exhibit dangerous dualities, making them susceptible to sophisticated attacks.
Cognitive load from complex instructions can lead to detectable output leaks in reasoning models.

Computer Science > Cryptography and Security arXiv:2602.18514 (cs) [Submitted on 19 Feb 2026] Title:Trojan Horses in Recruiting: A Red-Teaming Case Study on Indirect Prompt Injection in Standard vs. Reasoning Models Authors:Manuel Wirth View a PDF of the paper titled Trojan Horses in Recruiting: A Red-Teaming Case Study on Indirect Prompt Injection in Standard vs. Reasoning Models, by Manuel Wirth View PDF HTML (experimental) Abstract:As Large Language Models (LLMs) are increasingly integrated into automated decision-making pipelines, specifically within Human Resources (HR), the security implications of Indirect Prompt Injection (IPI) become critical. While a prevailing hypothesis posits that "Reasoning" or "Chain-of-Thought" Models possess safety advantages due to their ability to self-correct, emerging research suggests these capabilities may enable more sophisticated alignment failures. This qualitative Red-Teaming case study challenges the safety-through-reasoning premise using the Qwen 3 30B architecture. By subjecting both a standard instruction-tuned model and a reasoning-enhanced model to a "Trojan Horse" curriculum vitae, distinct failure modes are observed. The results suggest a complex trade-off: while the Standard Model resorted to brittle hallucinations to justify simple attacks and filtered out illogical constraints in complex scenarios, the Reasoning Model displayed a dangerous duality. It employed advanced strategic reframing to make simple attacks highly ...

Read Original Article

[2602.18514] Trojan Horses in Recruiting: A Red-Teaming Case Study on Indirect Prompt Injection in Standard vs. Reasoning Models

Summary

Why It Matters

Key Takeaways

Related Articles

Hackers Are Posting the Claude Code Leak With Bonus Malware | WIRED

People anxious about deviating from what AI tells them to do?

ChatGPT on trial: A landmark test of AI liability in the practice of law

What if Claude purposefully made its own code leakable so that it would get leaked

No comments

Stay updated with AI News