[2507.06567] SlimCaching: Edge Caching of Mixture-of-Experts for

[2507.06567] SlimCaching: Edge Caching of Mixture-of-Experts for Distributed Inference

arXiv - Machine Learning March 03, 2026 4 min read

About this article

Abstract page for arXiv paper 2507.06567: SlimCaching: Edge Caching of Mixture-of-Experts for Distributed Inference

Computer Science > Machine Learning arXiv:2507.06567 (cs) [Submitted on 9 Jul 2025 (v1), last revised 1 Mar 2026 (this version, v3)] Title:SlimCaching: Edge Caching of Mixture-of-Experts for Distributed Inference Authors:Qian Chen, Xianhao Chen, Kaibin Huang View a PDF of the paper titled SlimCaching: Edge Caching of Mixture-of-Experts for Distributed Inference, by Qian Chen and 2 other authors View PDF HTML (experimental) Abstract:Mixture-of-Experts (MoE) models improve the scalability of large language models (LLMs) by activating only a small subset of relevant experts per input. However, the sheer number of expert networks in an MoE model introduces a significant storage/memory burden for an edge device. To address this challenge, we consider a scenario where experts are dispersed across an edge network for distributed inference. Based on the popular Top-$K$ expert selection strategy, we formulate a latency minimization problem by optimizing expert caching on edge servers under storage constraints. When $K=1$, the problem reduces to a monotone submodular maximization problem with knapsack constraints, for which we design a greedy-based algorithm with a $(1 - 1/e)$-approximation guarantee. For the general case where $K\geq1$, expert co-activation within the same MoE layer introduces non-submodularity, which renders greedy methods ineffective. To tackle this issue, we propose a successive greedy decomposition method to decompose the original problem into a series of subpr...

Originally published on March 03, 2026. Curated by AI News.

Llms

AI Has Broken the Internet

So the web has been breaking a lot lately. Vercel is down. GitHub is down. Claude is down. Cloudflare is down. AWS is down. Everything is...

Reddit - Artificial Intelligence · 1 min · 24 minutes ago

Llms

LLM agents can trigger real actions now. But what actually stops them from executing?

We ran into a simple but important issue while building agents with tool calling: the model can propose actions but nothing actually enfo...

Reddit - Artificial Intelligence · 1 min · 24 minutes ago

Llms

Are LLMs a Dead End? (Investors Just Bet $1 Billion on “Yes”)

| AI Reality Check | Cal Newport Chapters 0:00 What is Yan LeCun Up To? 14:55 How is it possible that LeCun could be right about LLM’s be...

Reddit - Artificial Intelligence · 1 min · 24 minutes ago

Llms

Mercor says it was hit by cyberattack tied to compromise of open-source LiteLLM project | TechCrunch

The AI recruiting startup confirmed a security incident after an extortion hacking crew took credit for stealing data from the company's ...

TechCrunch - AI · 4 min · 39 minutes ago

[2507.06567] SlimCaching: Edge Caching of Mixture-of-Experts for Distributed Inference

About this article

Related Articles

AI Has Broken the Internet

LLM agents can trigger real actions now. But what actually stops them from executing?

Are LLMs a Dead End? (Investors Just Bet $1 Billion on “Yes”)

Mercor says it was hit by cyberattack tied to compromise of open-source LiteLLM project | TechCrunch

No comments

Stay updated with AI News