[2603.24044] MoE-Sieve: Routing-Guided LoRA for Efficient MoE Fine-Tuning
About this article
Abstract page for arXiv paper 2603.24044: MoE-Sieve: Routing-Guided LoRA for Efficient MoE Fine-Tuning
Computer Science > Machine Learning arXiv:2603.24044 (cs) [Submitted on 25 Mar 2026] Title:MoE-Sieve: Routing-Guided LoRA for Efficient MoE Fine-Tuning Authors:Andrea Manzoni View a PDF of the paper titled MoE-Sieve: Routing-Guided LoRA for Efficient MoE Fine-Tuning, by Andrea Manzoni View PDF HTML (experimental) Abstract:Standard LoRA fine-tuning of Mixture-of-Experts (MoE) models applies adapters to every expert, yet our profiling shows that per-layer expert routing is highly skewed: a small subset of experts handles most tokens in each layer, while many others are rarely activated ("cold"). We propose MoE-Sieve, a simple routing-guided framework for LoRA fine-tuning, and pair it with a systematic profiling study of expert routing across architectures and tasks. The method is simple: profile routing counts on a small calibration set, select the top-k most-routed experts per layer, and apply LoRA only to those experts. Across two architecturally distinct MoE models and three diverse tasks, tuning only the top 25% routed experts per layer remains competitive with full LoRA, with mean differences within +/-1 percentage point across all conditions. This reduces LoRA trainable parameters by 70-73%, adapter checkpoint size by 71-73%, and wall-clock training time by up to 50%. We also observe a non-monotonic relationship between expert count and seed-to-seed variance, consistent with the hypothesis that adapting cold experts can introduce gradient noise without improving accura...