[2601.04548] Identifying Good and Bad Neurons for Task-Level Controllable LLMs
About this article
Abstract page for arXiv paper 2601.04548: Identifying Good and Bad Neurons for Task-Level Controllable LLMs
Computer Science > Computation and Language arXiv:2601.04548 (cs) [Submitted on 8 Jan 2026 (v1), last revised 5 Mar 2026 (this version, v2)] Title:Identifying Good and Bad Neurons for Task-Level Controllable LLMs Authors:Wenjie Li, Guansong Pang, Hezhe Qiao, Debin Gao, David Lo View a PDF of the paper titled Identifying Good and Bad Neurons for Task-Level Controllable LLMs, by Wenjie Li and 4 other authors View PDF HTML (experimental) Abstract:Large Language Models have demonstrated remarkable capabilities on multiple-choice question answering benchmarks, but the complex mechanisms underlying their large-scale neurons remain opaque, posing significant challenges for understanding and steering LLMs. While recent studies made progress on identifying responsible neurons for certain abilities, these ability-specific methods are infeasible for task-focused scenarios requiring coordinated use of multiple abilities. Moreover, these approaches focus only on supportive neurons that correlate positively with task completion, while neglecting neurons with other roles-such as inhibitive roles-and misled neuron attribution due to fortuitous behaviors in LLMs (i.e., correctly answer the questions by chance rather than genuine understanding). To address these challenges, we propose NeuronLLM, a novel task-level LLM understanding framework that adopts the biological principle of functional antagonism for LLM neuron identification. The key insight is that task performance is jointly determ...