[2509.22299] HEAPr: Hessian-based Efficient Atomic Expert Pruning in Output Space
About this article
Abstract page for arXiv paper 2509.22299: HEAPr: Hessian-based Efficient Atomic Expert Pruning in Output Space
Computer Science > Machine Learning arXiv:2509.22299 (cs) [Submitted on 26 Sep 2025 (v1), last revised 28 Feb 2026 (this version, v2)] Title:HEAPr: Hessian-based Efficient Atomic Expert Pruning in Output Space Authors:Ke Li, Zheng Yang, Zhongbin Zhou, Feng Xue, Zhonglin Jiang, Wenxiao Wang View a PDF of the paper titled HEAPr: Hessian-based Efficient Atomic Expert Pruning in Output Space, by Ke Li and 5 other authors View PDF HTML (experimental) Abstract:Mixture-of-Experts (MoE) architectures in large language models (LLMs) deliver exceptional performance and reduced inference costs compared to dense LLMs. However, their large parameter counts result in prohibitive memory requirements, limiting practical deployment. While existing pruning methods primarily focus on expert-level pruning, this coarse granularity often leads to substantial accuracy degradation. In this work, we introduce HEAPr, a novel pruning algorithm that decomposes experts into smaller, indivisible atomic experts, enabling more precise and flexible atomic expert pruning. To measure the importance of each atomic expert, we leverage second-order information based on principles similar to the Optimal Brain Surgeon theory. To address the computational and storage challenges posed by second-order information, HEAPr exploits the inherent properties of atomic experts to transform the second-order information from expert parameters into that of atomic expert parameters, and further simplifies it to the second-ord...