[2507.00390] MoNE: Replacing Redundant Experts with Lightweight Novices for Structured Pruning of MoE

[2507.00390] MoNE: Replacing Redundant Experts with Lightweight Novices for Structured Pruning of MoE

arXiv - Machine Learning 4 min read Article

Summary

The paper introduces MoNE, a novel method for structured pruning of Mixture-of-Experts (MoE) models, replacing redundant experts with lightweight novices to enhance model efficiency while minimizing performance degradation.

Why It Matters

As large language models grow in complexity, efficient resource management becomes crucial. MoNE addresses the memory overhead associated with MoE models by proposing a method that maintains performance while reducing redundancy. This innovation is significant for researchers and practitioners aiming to optimize AI models without sacrificing accuracy.

Key Takeaways

  • MoNE effectively replaces redundant experts in MoE models with lightweight novices.
  • The method demonstrates minimal accuracy degradation while achieving significant memory savings.
  • Extensive experiments show MoNE outperforms existing pruning methods across multiple tasks.

Computer Science > Machine Learning arXiv:2507.00390 (cs) [Submitted on 1 Jul 2025 (v1), last revised 22 Feb 2026 (this version, v2)] Title:MoNE: Replacing Redundant Experts with Lightweight Novices for Structured Pruning of MoE Authors:Geng Zhang, Yuxuan Han, Yuxuan Lou, Yiqi Zhang, Wangbo Zhao, Yang You View a PDF of the paper titled MoNE: Replacing Redundant Experts with Lightweight Novices for Structured Pruning of MoE, by Geng Zhang and 5 other authors View PDF HTML (experimental) Abstract:Mixture-of-Experts (MoE) enables efficient scaling of large language models by activating only a subset of experts per input token. However, deploying MoE-based models incurs significant memory overhead due to the need to retain all experts in memory. While structured pruning is promising to reduce memory costs, existing methods often show suboptimal performance and unstable degradation in three dimensions: model architectures, calibration data sources, and calibration sample sizes. This paper proposes Mixture-of-Novices-and-Experts (MoNE), a novel expert pruning method that replaces redundant experts with lightweight novices to achieve effective and robust model compression. MoNE evaluates expert redundancy based on two metrics: access frequency and output variance. Experts exhibiting low usage and stable outputs are pruned and replaced with lightweight novices-unbiased estimations of their original outputs-minimizing performance degradation. Extensive experiments demonstrate that ...

Related Articles

Block Resets Management With AI As Cash App Adds Installment Transfers
Llms

Block Resets Management With AI As Cash App Adds Installment Transfers

Block (NYSE:XYZ) plans a permanent organizational overhaul that replaces many middle management roles with AI-driven models to create fla...

AI Tools & Products · 5 min ·
Anthropic leaks source code for its AI coding agent Claude
Llms

Anthropic leaks source code for its AI coding agent Claude

Anthropic accidentally exposed roughly 512,000 lines of proprietary TypeScript source code for its AI-powered coding agent Claude Code

AI Tools & Products · 3 min ·
AI Desktop 98 lets you chat with Claude, ChatGPT, and Gemini through a Windows 98-inspired interface
Llms

AI Desktop 98 lets you chat with Claude, ChatGPT, and Gemini through a Windows 98-inspired interface

It even has Minesweeper.

AI Tools & Products · 3 min ·
Llms

[R] Looking for arXiv cs.LG endorser, inference monitoring using information geometry

Hi r/MachineLearning, I’m looking for an arXiv endorser in cs.LG for a paper on inference-time distribution shift detection for deployed ...

Reddit - Machine Learning · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime