[2603.02578] How Controllable Are Large Language Models? A Unified Evaluation across Behavioral Granularities
About this article
Abstract page for arXiv paper 2603.02578: How Controllable Are Large Language Models? A Unified Evaluation across Behavioral Granularities
Computer Science > Computation and Language arXiv:2603.02578 (cs) [Submitted on 3 Mar 2026] Title:How Controllable Are Large Language Models? A Unified Evaluation across Behavioral Granularities Authors:Ziwen Xu, Kewei Xu, Haoming Xu, Haiwen Hong, Longtao Huang, Hui Xue, Ningyu Zhang, Yongliang Shen, Guozhou Zheng, Huajun Chen, Shumin Deng View a PDF of the paper titled How Controllable Are Large Language Models? A Unified Evaluation across Behavioral Granularities, by Ziwen Xu and 10 other authors View PDF HTML (experimental) Abstract:Large Language Models (LLMs) are increasingly deployed in socially sensitive domains, yet their unpredictable behaviors, ranging from misaligned intent to inconsistent personality, pose significant risks. We introduce SteerEval, a hierarchical benchmark for evaluating LLM controllability across three domains: language features, sentiment, and personality. Each domain is structured into three specification levels: L1 (what to express), L2 (how to express), and L3 (how to instantiate), connecting high-level behavioral intent to concrete textual output. Using SteerEval, we systematically evaluate contemporary steering methods, revealing that control often degrades at finer-grained levels. Our benchmark offers a principled and interpretable framework for safe and controllable LLM behavior, serving as a foundation for future research. Comments: Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Human-Computer Interaction...