[2604.02608] Steerable but Not Decodable: Function Vectors Operate

[2604.02608] Steerable but Not Decodable: Function Vectors Operate Beyond the Logit Lens

arXiv - Machine Learning April 06, 2026 4 min read

About this article

Abstract page for arXiv paper 2604.02608: Steerable but Not Decodable: Function Vectors Operate Beyond the Logit Lens

Computer Science > Machine Learning arXiv:2604.02608 (cs) [Submitted on 3 Apr 2026] Title:Steerable but Not Decodable: Function Vectors Operate Beyond the Logit Lens Authors:Mohammed Suhail B Nadaf View a PDF of the paper titled Steerable but Not Decodable: Function Vectors Operate Beyond the Logit Lens, by Mohammed Suhail B Nadaf View PDF HTML (experimental) Abstract:Function vectors (FVs) -- mean-difference directions extracted from in-context learning demonstrations -- can steer large language model behavior when added to the residual stream. We hypothesized that FV steering failures reflect an absence of task-relevant information: the logit lens would fail alongside steering. We were wrong. In the most comprehensive cross-template FV transfer study to date - 4,032 pairs across 12 tasks, 6 models from 3 families (Llama-3.1-8B, Gemma-2-9B, Mistral-7B-v0.3; base and instruction-tuned), 8 templates per task - we find the opposite dissociation: FV steering succeeds even when the logit lens cannot decode the correct answer at any layer. This steerability-without-decodability pattern is universal: steering exceeds logit lens accuracy for every task on every model, with gaps as large as -0.91. Only 3 of 72 task-model instances show the predicted decodable-without-steerable pattern, all in Mistral. FV vocabulary projection reveals that FVs achieving over 0.90 steering accuracy still project to incoherent token distributions, indicating FVs encode computational instructions rath...

Originally published on April 06, 2026. Curated by AI News.

Llms

ChatGPT finally offers $100/month Pro plan | TechCrunch

OpenAI announced on Thursday something that power users have been asking for: a $100/month plan. Previously, subscriptions jumped from $2...

TechCrunch - AI · 4 min · 29 minutes ago

Llms

Florida AG announces investigation into OpenAI over shooting that allegedly involved ChatGPT | TechCrunch

ChatGPT had reportedly been used to plan the attack that killed two and injured five at Florida State University last April. The family o...

TechCrunch - AI · 4 min · about 2 hours ago

Llms

We’re open-sourcing a 33-benchmark diagnostic for AI alignment gaps, launches April 27

On April 27 we’re open-sourcing a free diagnostic tool called iFixAi. You run it against your AI system (agent, copilot, LLM integration,...

Reddit - Artificial Intelligence · 1 min · about 3 hours ago

Llms

Google’s Gemini AI can answer your questions with 3D models and simulations

submitted by /u/tekz [link] [comments]

Reddit - Artificial Intelligence · 1 min · about 3 hours ago

[2604.02608] Steerable but Not Decodable: Function Vectors Operate Beyond the Logit Lens

About this article

Related Articles

ChatGPT finally offers $100/month Pro plan | TechCrunch

Florida AG announces investigation into OpenAI over shooting that allegedly involved ChatGPT | TechCrunch

We’re open-sourcing a 33-benchmark diagnostic for AI alignment gaps, launches April 27

Google’s Gemini AI can answer your questions with 3D models and simulations

No comments

Stay updated with AI News