[2604.04064] Extracting and Steering Emotion Representations in Small Language Models: A Methodological Comparison
About this article
Abstract page for arXiv paper 2604.04064: Extracting and Steering Emotion Representations in Small Language Models: A Methodological Comparison
Computer Science > Computation and Language arXiv:2604.04064 (cs) [Submitted on 5 Apr 2026] Title:Extracting and Steering Emotion Representations in Small Language Models: A Methodological Comparison Authors:Jihoon Jeong View a PDF of the paper titled Extracting and Steering Emotion Representations in Small Language Models: A Methodological Comparison, by Jihoon Jeong View PDF HTML (experimental) Abstract:Small language models (SLMs) in the 100M-10B parameter range increasingly power production systems, yet whether they possess the internal emotion representations recently discovered in frontier models remains unknown. We present the first comparative analysis of emotion vector extraction methods for SLMs, evaluating 9 models across 5 architectural families (GPT-2, Gemma, Qwen, Llama, Mistral) using 20 emotions and two extraction methods (generation-based and comprehension-based). Generation-based extraction produces statistically superior emotion separation (Mann-Whitney p = 0.007; Cohen's d = -107.5), with the advantage modulated by instruction tuning and architecture. Emotion representations localize at middle transformer layers (~50% depth), following a U-shaped curve that is architecture-invariant from 124M to 3B parameters. We validate these findings against representational anisotropy baselines across 4 models and confirm causal behavioral effects through steering experiments, independently verified by an external emotion classifier (92% success rate, 37/40 scenario...