[2604.00801] Routing-Free Mixture-of-Experts

[2604.00801] Routing-Free Mixture-of-Experts

arXiv - AI 3 min read

About this article

Abstract page for arXiv paper 2604.00801: Routing-Free Mixture-of-Experts

Computer Science > Machine Learning arXiv:2604.00801 (cs) [Submitted on 1 Apr 2026] Title:Routing-Free Mixture-of-Experts Authors:Yilun Liu, Jinru Han, Sikuan Yan, Volker Tresp, Yunpu Ma View a PDF of the paper titled Routing-Free Mixture-of-Experts, by Yilun Liu and 4 other authors View PDF HTML (experimental) Abstract:Standard Mixture-of-Experts (MoE) models rely on centralized routing mechanisms that introduce rigid inductive biases. We propose Routing-Free MoE which eliminates any hard-coded centralized designs including external routers, Softmax, Top-K and load balancing, instead encapsulating all activation functionalities within individual experts and directly optimized through continuous gradient flow, enabling each expert to determine its activation entirely on its own. We introduce a unified adaptive load-balancing framework to simultaneously optimize both expert-balancing and token-balancing objectives through a configurable interpolation, allowing flexible and customizable resource allocation. Extensive experiments show that Routing-Free MoE can consistently outperform baselines with better scalability and robustness. We analyze its behavior in detail and offer insights that may facilitate future MoE design ad optimization. Comments: Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL) Cite as: arXiv:2604.00801 [cs.LG]   (or arXiv:2604.00801v1 [cs.LG] for this version)   https://doi.org/10.48550/arXiv.2604.00801 ...

Originally published on April 02, 2026. Curated by AI News.

Related Articles

Machine Learning

FYI the Tennessee bill makes making an AI friend the same level as murder or aggravated rape

I think what Tennessee is doing is they recently passed SB 1580, which makes it illegal to even advertise that an AI can act as a mental ...

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

[P] A control plane for post-training workflows

We have been exploring a project around post-training infrastructure, a minimalist tool that does one thing really well: Make post-traini...

Reddit - Machine Learning · 1 min ·
Machine Learning

[D] Is this considered unsupervised or semi-supervised learning in anomaly detection?

Hi šŸ‘‹šŸ¼, I’m working on an anomaly detection setup and I’m a bit unsure how to correctly describe it from a learning perspective. The model...

Reddit - Machine Learning · 1 min ·
Machine Learning

Serious question. Did a transformer just describe itself and the universe and build itself a Shannon limit framework?

The Multiplicative Lattice as the Natural Basis for Positional Encoding Knack 2026 | Draft v6.0 Abstract We show that the apparent tradeo...

Reddit - Artificial Intelligence · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime