[2603.02217] Is Retraining-Free Enough? The Necessity of Router Calibration for Efficient MoE Compression

[2603.02217] Is Retraining-Free Enough? The Necessity of Router Calibration for Efficient MoE Compression

arXiv - AI 3 min read

About this article

Abstract page for arXiv paper 2603.02217: Is Retraining-Free Enough? The Necessity of Router Calibration for Efficient MoE Compression

Computer Science > Machine Learning arXiv:2603.02217 (cs) [Submitted on 10 Feb 2026] Title:Is Retraining-Free Enough? The Necessity of Router Calibration for Efficient MoE Compression Authors:Sieun Hyeon, Jaeyoung Do View a PDF of the paper titled Is Retraining-Free Enough? The Necessity of Router Calibration for Efficient MoE Compression, by Sieun Hyeon and 1 other authors View PDF HTML (experimental) Abstract:Mixture-of-Experts (MoE) models scale capacity efficiently, but their massive parameter footprint creates a deployment-time memory bottleneck. We organize retraining-free MoE compression into three paradigms - Expert Pruning, Expert Editing, and Expert Merging - and show that persistent post-compression degradation largely stems from a neglected factor: router-expert mismatch when experts are changed but the router is left untouched. We argue that effective retraining-free compression should avoid updating expert parameters while allowing lightweight router calibration. To this end, we propose Router Knowledge Distillation (Router KD), which updates only a tiny fraction of parameters (the router) by distilling the original model's next-token distribution on unlabeled calibration data. Experiments across representative methods in all three paradigms demonstrate consistent performance recovery, with substantially larger gains in fine-grained MoEs (many small experts) than in coarse-grained MoEs due to their more complex routing decision boundaries. Subjects: Machine L...

Originally published on March 04, 2026. Curated by AI News.

Related Articles

Yupp shuts down after raising $33M from a16z crypto's Chris Dixon | TechCrunch
Machine Learning

Yupp shuts down after raising $33M from a16z crypto's Chris Dixon | TechCrunch

Less than a year after launching, with checks from some of the biggest names in Silicon Valley, crowdsourced AI model feedback startup Yu...

TechCrunch - AI · 4 min ·
Machine Learning

[R] Fine-tuning services report

If you have some data and want to train or run a small custom model but don't have powerful enough hardware for training, fine-tuning ser...

Reddit - Machine Learning · 1 min ·
Machine Learning

[D] Does ML have a "bible"/reference textbook at the Intermediate/Advanced level?

Hello, everyone! This is my first time posting here and I apologise if the question is, perhaps, a bit too basic for this sub-reddit. A b...

Reddit - Machine Learning · 1 min ·
Machine Learning

[D] ICML 2026 review policy debate: 100 responses suggest Policy B may score higher, while Policy A shows higher confidence

A week ago I made a thread asking whether ICML 2026’s review policy might have affected review outcomes, especially whether Policy A pape...

Reddit - Machine Learning · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime