[2509.24198] Negative Pre-activations Differentiate Syntax
About this article
Abstract page for arXiv paper 2509.24198: Negative Pre-activations Differentiate Syntax
Computer Science > Machine Learning arXiv:2509.24198 (cs) [Submitted on 29 Sep 2025 (v1), last revised 1 Mar 2026 (this version, v2)] Title:Negative Pre-activations Differentiate Syntax Authors:Linghao Kong, Angelina Ning, Micah Adler, Nir Shavit View a PDF of the paper titled Negative Pre-activations Differentiate Syntax, by Linghao Kong and 3 other authors View PDF HTML (experimental) Abstract:Modern large language models increasingly use smooth activation functions such as GELU or SiLU, allowing negative pre-activations to carry both signal and gradient. Nevertheless, many neuron-level interpretability analyses have historically focused on large positive activations, often implicitly treating the negative region as less informative, a carryover from the ReLU-era. We challenge this assumption and ask whether and how negative pre-activations are leveraged by models. We address this question by studying a sparse subpopulation of Wasserstein neurons whose output distributions deviate strongly from a Gaussian baseline and that functionally differentiate similar inputs. We show that this negative region plays an active role rather than reflecting a mere gradient optimization side effect. A minimal, sign-specific intervention that zeroes only the negative pre-activations of a small set of Wasserstein neurons substantially increases perplexity and sharply degrades grammatical performance on BLiMP and TSE, whereas both random and perplexity-matched ablations of many more non-Was...