[2510.04309] Activation Steering with a Feedback Controller
About this article
Abstract page for arXiv paper 2510.04309: Activation Steering with a Feedback Controller
Computer Science > Machine Learning arXiv:2510.04309 (cs) [Submitted on 5 Oct 2025 (v1), last revised 27 Mar 2026 (this version, v2)] Title:Activation Steering with a Feedback Controller Authors:Dung V. Nguyen, Hieu M. Vu, Nhi Y. Pham, Lei Zhang, Tan M. Nguyen View a PDF of the paper titled Activation Steering with a Feedback Controller, by Dung V. Nguyen and 4 other authors View PDF Abstract:Controlling the behaviors of large language models (LLM) is fundamental to their safety alignment and reliable deployment. However, existing steering methods are primarily driven by empirical insights and lack theoretical performance guarantees. In this work, we develop a control-theoretic foundation for activation steering by showing that popular steering methods correspond to the proportional (P) controllers, with the steering vector serving as the feedback signal. Building on this finding, we propose Proportional-Integral-Derivative (PID) Steering, a principled framework that leverages the full PID controller for activation steering in LLMs. The proportional (P) term aligns activations with target semantic directions, the integral (I) term accumulates errors to enforce persistent corrections across layers, and the derivative (D) term mitigates overshoot by counteracting rapid activation changes. This closed-loop design yields interpretable error dynamics and connects activation steering to classical stability guarantees in control theory. Moreover, PID Steering is lightweight, modu...