[2602.16958] Automating Agent Hijacking via Structural Template Injection

[2602.16958] Automating Agent Hijacking via Structural Template Injection

arXiv - Machine Learning 4 min read Article

Summary

This paper presents Phantom, an automated framework for agent hijacking via Structural Template Injection, enhancing attack success rates against LLMs.

Why It Matters

As agent hijacking poses a significant threat to the integrity of Large Language Models (LLMs), this research offers a novel approach to exploit vulnerabilities in AI systems. By identifying over 70 vulnerabilities in real-world applications, it underscores the urgent need for improved security measures in AI technologies.

Key Takeaways

  • Phantom framework automates agent hijacking using structured templates.
  • Enhances attack transferability against black-box LLMs.
  • Identified over 70 vulnerabilities in commercial AI products.
  • Utilizes multi-level template augmentation for structural diversity.
  • Introduces Bayesian optimization for efficient adversarial vector identification.

Computer Science > Artificial Intelligence arXiv:2602.16958 (cs) [Submitted on 18 Feb 2026] Title:Automating Agent Hijacking via Structural Template Injection Authors:Xinhao Deng, Jiaqing Wu, Miao Chen, Yue Xiao, Ke Xu, Qi Li View a PDF of the paper titled Automating Agent Hijacking via Structural Template Injection, by Xinhao Deng and 5 other authors View PDF HTML (experimental) Abstract:Agent hijacking, highlighted by OWASP as a critical threat to the Large Language Model (LLM) ecosystem, enables adversaries to manipulate execution by injecting malicious instructions into retrieved content. Most existing attacks rely on manually crafted, semantics-driven prompt manipulation, which often yields low attack success rates and limited transferability to closed-source commercial models. In this paper, we propose Phantom, an automated agent hijacking framework built upon Structured Template Injection that targets the fundamental architectural mechanisms of LLM agents. Our key insight is that agents rely on specific chat template tokens to separate system, user, assistant, and tool instructions. By injecting optimized structured templates into the retrieved context, we induce role confusion and cause the agent to misinterpret the injected content as legitimate user instructions or prior tool outputs. To enhance attack transferability against black-box agents, Phantom introduces a novel attack template search framework. We first perform multi-level template augmentation to increa...

Related Articles

Llms

Artificial intelligence will always depends on human otherwise it will be obsolete.

I was looking for a tool for my specific need. There was not any. So i started to write the program in python, just basic structure. Then...

Reddit - Artificial Intelligence · 1 min ·
Llms

My AI spent last night modifying its own codebase

I've been working on a local AI system called Apis that runs completely offline through Ollama. During a background run, Apis identified ...

Reddit - Artificial Intelligence · 1 min ·
Llms

Fake users generated by AI can't simulate humans — review of 182 research papers. Your thoughts?

https://www.researchsquare.com/article/rs-9057643/v1 There’s a massive trend right now where tech companies, businesses, even researchers...

Reddit - Artificial Intelligence · 1 min ·
Llms

Depth-first pruning seems to transfer from GPT-2 to Llama (unexpectedly well)

TL;DR: Removing the right transformer layers (instead of shrinking all layers) gives smaller, faster models with minimal quality loss — a...

Reddit - Artificial Intelligence · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime