Llms Machine Learning Ai Infrastructure Generative Ai

[2602.15521] ExpertWeaver: Unlocking the Inherent MoE in Dense LLMs with GLU Activation Patterns

arXiv - Machine Learning February 18, 2026 4 min read Article

Summary

The paper presents ExpertWeaver, a framework that enhances the conversion of dense LLMs into sparse Mixture-of-Experts (MoE) models using Gated Linear Unit (GLU) activation patterns, achieving superior performance without the need for extensive training.

Why It Matters

As LLMs grow in complexity, efficient scaling is crucial. ExpertWeaver addresses the challenge of converting dense models to MoE architectures, which can improve computational efficiency and model performance, making it relevant for researchers and practitioners in machine learning and AI.

Key Takeaways

ExpertWeaver utilizes GLU activation patterns for efficient MoE conversion.
The framework allows for training-free dynamic structural pruning and improved initialization.
ExpertWeaver outperforms existing dense-to-MoE methods in terms of performance.

Computer Science > Computation and Language arXiv:2602.15521 (cs) [Submitted on 17 Feb 2026] Title:ExpertWeaver: Unlocking the Inherent MoE in Dense LLMs with GLU Activation Patterns Authors:Ziyu Zhao, Tong Zhu, Zhi Zhang, Tiantian Fan, Jinluan Yang, Kun Kuang, Zhongyu Wei, Fei Wu, Yu Cheng View a PDF of the paper titled ExpertWeaver: Unlocking the Inherent MoE in Dense LLMs with GLU Activation Patterns, by Ziyu Zhao and 8 other authors View PDF HTML (experimental) Abstract:Mixture-of-Experts (MoE) effectively scales model capacity while preserving computational efficiency through sparse expert activation. However, training high-quality MoEs from scratch is prohibitively expensive. A promising alternative is to convert pretrained dense models into sparse MoEs. Existing dense-to-MoE methods fall into two categories: \textbf{dynamic structural pruning} that converts dense models into MoE architectures with moderate sparsity to balance performance and inference efficiency, and \textbf{downcycling} approaches that use pretrained dense models to initialize highly sparse MoE architectures. However, existing methods break the intrinsic activation patterns within dense models, leading to suboptimal expert construction. In this work, we argue that the Gated Linear Unit (GLU) mechanism provides a natural blueprint for dense-to-MoE conversion. We show that the fine-grained neural-wise activation patterns of GLU reveal a coarse-grained structure, uncovering an inherent MoE architectur...

Read Original Article

[2602.15521] ExpertWeaver: Unlocking the Inherent MoE in Dense LLMs with GLU Activation Patterns

Summary

Why It Matters

Key Takeaways

Related Articles

One of The Worst AI's I've Ever Seen

Claude Opus 4.6 API at 40% below Anthropic pricing – try free before you pay anything

The open-source AI system that beat Claude Sonnet on a $500 GPU just shipped a coding assistant

Claude Max 20x usage hit 40% by Monday noon — how does Codex CLI compare?

No comments

Stay updated with AI News