[2603.02217] Is Retraining-Free Enough? The Necessity of Router Calibration for Efficient MoE Compression
About this article
Abstract page for arXiv paper 2603.02217: Is Retraining-Free Enough? The Necessity of Router Calibration for Efficient MoE Compression
Computer Science > Machine Learning arXiv:2603.02217 (cs) [Submitted on 10 Feb 2026] Title:Is Retraining-Free Enough? The Necessity of Router Calibration for Efficient MoE Compression Authors:Sieun Hyeon, Jaeyoung Do View a PDF of the paper titled Is Retraining-Free Enough? The Necessity of Router Calibration for Efficient MoE Compression, by Sieun Hyeon and 1 other authors View PDF HTML (experimental) Abstract:Mixture-of-Experts (MoE) models scale capacity efficiently, but their massive parameter footprint creates a deployment-time memory bottleneck. We organize retraining-free MoE compression into three paradigms - Expert Pruning, Expert Editing, and Expert Merging - and show that persistent post-compression degradation largely stems from a neglected factor: router-expert mismatch when experts are changed but the router is left untouched. We argue that effective retraining-free compression should avoid updating expert parameters while allowing lightweight router calibration. To this end, we propose Router Knowledge Distillation (Router KD), which updates only a tiny fraction of parameters (the router) by distilling the original model's next-token distribution on unlabeled calibration data. Experiments across representative methods in all three paradigms demonstrate consistent performance recovery, with substantially larger gains in fine-grained MoEs (many small experts) than in coarse-grained MoEs due to their more complex routing decision boundaries. Subjects: Machine L...