[2502.06809] Neurons Speak in Ranges: Breaking Free from Discrete Neuronal Attribution
About this article
Abstract page for arXiv paper 2502.06809: Neurons Speak in Ranges: Breaking Free from Discrete Neuronal Attribution
Computer Science > Machine Learning arXiv:2502.06809 (cs) [Submitted on 4 Feb 2025 (v1), last revised 10 Apr 2026 (this version, v3)] Title:Neurons Speak in Ranges: Breaking Free from Discrete Neuronal Attribution Authors:Muhammad Umair Haider, Hammad Rizwan, Hassan Sajjad, Peizhong Ju, A.B. Siddique View a PDF of the paper titled Neurons Speak in Ranges: Breaking Free from Discrete Neuronal Attribution, by Muhammad Umair Haider and 4 other authors View PDF HTML (experimental) Abstract:Pervasive polysemanticity in large language models (LLMs) undermines discrete neuron-concept attribution, posing a significant challenge for model interpretation and control. We systematically analyze both encoder and decoder based LLMs across diverse datasets, and observe that even highly salient neurons for specific semantic concepts consistently exhibit polysemantic behavior. Importantly, we uncover a consistent pattern: concept-conditioned activation magnitudes of neurons form distinct, often Gaussian-like distributions with minimal overlap. Building on this observation, we hypothesize that interpreting and intervening on concept-specific activation ranges can enable more precise interpretability and targeted manipulation in LLMs. To this end, we introduce NeuronLens, a novel range-based interpretation and manipulation framework, that localizes concept attribution to activation ranges within a neuron. Extensive empirical evaluations show that range-based interventions enable effective ma...