[2605.07631] Inference Time Causal Probing in LLMs
About this article
Abstract page for arXiv paper 2605.07631: Inference Time Causal Probing in LLMs
Computer Science > Artificial Intelligence arXiv:2605.07631 (cs) [Submitted on 8 May 2026] Title:Inference Time Causal Probing in LLMs Authors:Sadegh Khorasani, Saber Salehkaleybar, Negar Kiyavash, Matthias Grossglauser View a PDF of the paper titled Inference Time Causal Probing in LLMs, by Sadegh Khorasani and 3 other authors View PDF HTML (experimental) Abstract:Causal probing methods aim to test and control how internal representations influence the behavior of generative models. In causal probing, an intervention modifies hidden states so that a property takes on a different value. Most existing approaches define such interventions by training an auxiliary probe classifier, which ties the method to a specific task or model and risks misalignment with the model's predictive geometry. We propose Hidden-state Driven Margin Intervention (HDMI), a probe-free, gradient-based technique that directly steers hidden states using the model's native output. HDMI applies a margin objective that increases the probability of a target continuation while decreasing that of the source, without relying on probe classifiers. We further introduce a lookahead variant (LA-HDMI) for text editing that backpropagates through the softmax embeddings, modifying the current hidden state so that the likelihood of user-specified tokens increases in next token generations while preserving fluency. To evaluate interventions, we measure completeness (whether the targeted property changes as intended) a...