[2604.04756] Darkness Visible: Reading the Exception Handler of a Language Model
About this article
Abstract page for arXiv paper 2604.04756: Darkness Visible: Reading the Exception Handler of a Language Model
Computer Science > Machine Learning arXiv:2604.04756 (cs) [Submitted on 6 Apr 2026] Title:Darkness Visible: Reading the Exception Handler of a Language Model Authors:Peter Balogh View a PDF of the paper titled Darkness Visible: Reading the Exception Handler of a Language Model, by Peter Balogh View PDF HTML (experimental) Abstract:The final MLP of GPT-2 Small exhibits a fully legible routing program -- 27 named neurons organized into a three-tier exception handler -- while the knowledge it routes remains entangled across ~3,040 residual neurons. We decompose all 3,072 neurons (to numerical precision) into: 5 fused Core neurons that reset vocabulary toward function words, 10 Differentiators that suppress wrong candidates, 5 Specialists that detect structural boundaries, and 7 Consensus neurons that each monitor a distinct linguistic dimension. The consensus-exception crossover -- where MLP intervention shifts from helpful to harmful -- is statistically sharp (bootstrap 95% CIs exclude zero at all consensus levels; crossover between 4/7 and 5/7). Three experiments show that "knowledge neurons" (Dai et al., 2022), at L11 of this model, function as routing infrastructure rather than fact storage: the MLP amplifies or suppresses signals already present in the residual stream from attention, scaling with contextual constraint. A garden-path experiment reveals a reversed garden-path effect -- GPT-2 uses verb subcategorization immediately, consistent with the exception handler ope...