[2604.09443] Many-Tier Instruction Hierarchy in LLM Agents
About this article
Abstract page for arXiv paper 2604.09443: Many-Tier Instruction Hierarchy in LLM Agents
Computer Science > Computation and Language arXiv:2604.09443 (cs) [Submitted on 10 Apr 2026] Title:Many-Tier Instruction Hierarchy in LLM Agents Authors:Jingyu Zhang, Tianjian Li, William Jurayj, Hongyuan Zhan, Benjamin Van Durme, Daniel Khashabi View a PDF of the paper titled Many-Tier Instruction Hierarchy in LLM Agents, by Jingyu Zhang and 5 other authors View PDF HTML (experimental) Abstract:Large language model agents receive instructions from many sources-system messages, user prompts, tool outputs, and more-each carrying different levels of trust and authority. When these instructions conflict, models must reliably follow the highest-privilege instruction to remain safe and effective. The dominant paradigm, instruction hierarchy (IH), assumes a fixed, small set of privilege levels (typically fewer than five) defined by rigid role labels (e.g., system > user). This is inadequate for real-world agentic settings, where conflicts can arise across far more sources and contexts. In this work, we propose Many-Tier Instruction Hierarchy (ManyIH), a paradigm for resolving instruction conflicts among instructions with arbitrarily many privilege levels. We introduce ManyIH-Bench, the first benchmark for ManyIH. ManyIH-Bench requires models to navigate up to 12 levels of conflicting instructions with varying privileges, comprising 853 agentic tasks (427 coding and 426 instruction-following). ManyIH-Bench composes constraints developed by LLMs and verified by humans to create re...