[2603.24652] Demystifying When Pruning Works via Representation Hierarchies
About this article
Abstract page for arXiv paper 2603.24652: Demystifying When Pruning Works via Representation Hierarchies
Computer Science > Computation and Language arXiv:2603.24652 (cs) [Submitted on 25 Mar 2026] Title:Demystifying When Pruning Works via Representation Hierarchies Authors:Shwai He, Guoheng Sun, Haichao Zhang, Yun Fu, Ang Li View a PDF of the paper titled Demystifying When Pruning Works via Representation Hierarchies, by Shwai He and 4 other authors View PDF HTML (experimental) Abstract:Network pruning, which removes less important parameters or architectures, is often expected to improve efficiency while preserving performance. However, this expectation does not consistently hold across language tasks: pruned models can perform well on non-generative tasks but frequently fail in generative settings. To understand this discrepancy, we analyze network pruning from a representation-hierarchy perspective, decomposing the internal computation of language models into three sequential spaces: embedding (hidden representations), logit (pre-softmax outputs), and probability (post-softmax distributions). We find that representations in the embedding and logit spaces are largely robust to pruning-induced perturbations. However, the nonlinear transformation from logits to probabilities amplifies these deviations, which accumulate across time steps and lead to substantial degradation during generation. In contrast, the stability of the categorical-token probability subspace, together with the robustness of the embedding space, supports the effectiveness of pruning for non-generative tas...