[2506.13734] Instruction Following by Principled Boosting Attention of Large Language Models
About this article
Abstract page for arXiv paper 2506.13734: Instruction Following by Principled Boosting Attention of Large Language Models
Computer Science > Computation and Language arXiv:2506.13734 (cs) [Submitted on 16 Jun 2025 (v1), last revised 26 Mar 2026 (this version, v3)] Title:Instruction Following by Principled Boosting Attention of Large Language Models Authors:Vitoria Guardieiro, Avishree Khare, Adam Stein, Eric Wong View a PDF of the paper titled Instruction Following by Principled Boosting Attention of Large Language Models, by Vitoria Guardieiro and 3 other authors View PDF HTML (experimental) Abstract:Large language models' behavior is often shaped by instructions such as system prompts, refusal boundaries, privacy constraints, and tool-use rules that must hold at inference time. Yet in practice these constraints can be violated under long contexts or when user-provided context conflicts with them, creating reliability and safety risks. This motivates inference-time interventions that strengthen instruction influence without retraining. One such intervention is attention steering, which biases attention toward instruction tokens. In this work, we present a unifying theory for attention steering methods by formalizing instruction following as rule-based competition between instruction rules and context-derived rules, with attention mediating which rules dominate. We prove that boosting attention to instruction tokens tilts this competition, making it harder for context to override instruction-following. However, excessive boosting can suppress task-relevant context that should be incorporated ...