[2505.16056] Not All Models Suit Expert Offloading: On Local Routing

[2505.16056] Not All Models Suit Expert Offloading: On Local Routing Consistency of Mixture-of-Expert Models

arXiv - Machine Learning March 03, 2026 4 min read

About this article

Abstract page for arXiv paper 2505.16056: Not All Models Suit Expert Offloading: On Local Routing Consistency of Mixture-of-Expert Models

Computer Science > Machine Learning arXiv:2505.16056 (cs) [Submitted on 21 May 2025 (v1), last revised 28 Feb 2026 (this version, v4)] Title:Not All Models Suit Expert Offloading: On Local Routing Consistency of Mixture-of-Expert Models Authors:Jingcong Liang, Siyuan Wang, Miren Tian, Yitong Li, Duyu Tang, Zhongyu Wei View a PDF of the paper titled Not All Models Suit Expert Offloading: On Local Routing Consistency of Mixture-of-Expert Models, by Jingcong Liang and 5 other authors View PDF Abstract:Mixture-of-Experts (MoE) enables efficient scaling of large language models (LLMs) with sparsely activated experts during inference. To effectively deploy large MoE models on memory-constrained devices, many systems introduce *expert offloading* that caches a subset of experts in fast memory, leaving others on slow memory to run on CPU or load on demand. While some research has exploited the locality of expert activations, where consecutive tokens activate similar experts, the degree of this **local routing consistency** varies across models and remains understudied. In this paper, we propose two metrics to measure local routing consistency of MoE models: (1) **Segment Routing Best Performance (SRP)**, which evaluates how well a fixed group of experts can cover the needs of a segment of tokens, and (2) **Segment Cache Best Hit Rate (SCH)**, which measures the hit rate of an expert cache utilizing a length of future information under a cache limit. We analyze 20 MoE LLMs with div...

Originally published on March 03, 2026. Curated by AI News.

Llms

Anthropic’s Unreleased Claude Mythos Might Be The Most Advanced AI Model Yet

Anthropic is testing an unreleased artificial intelligence (AI) model with capabilities that exceed any system it has previously released...

AI Tools & Products · 5 min · 30 minutes ago

Llms

Anthropic leaks part of Claude Code's internal source code

Claude Code has seen massive adoption over the last year, and its run-rate revenue had swelled to more than $2.5 billion as of February.

AI Tools & Products · 3 min · 30 minutes ago

Llms

Australian government and Anthropic sign MOU for AI safety and research

Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.

AI Tools & Products · 5 min · 30 minutes ago

Llms

Penguin to sue OpenAI over ChatGPT version of German children’s book

Publisher alleges AI research company’s chatbot violated its copyright over Coconut the Little Dragon series

AI Tools & Products · 3 min · 30 minutes ago

[2505.16056] Not All Models Suit Expert Offloading: On Local Routing Consistency of Mixture-of-Expert Models

About this article

Related Articles

Anthropic’s Unreleased Claude Mythos Might Be The Most Advanced AI Model Yet

Anthropic leaks part of Claude Code's internal source code

Australian government and Anthropic sign MOU for AI safety and research

Penguin to sue OpenAI over ChatGPT version of German children’s book

No comments

Stay updated with AI News