[2602.17798] Grassmannian Mixture-of-Experts: Concentration-Controlled Routing on Subspace Manifolds
Summary
The paper presents Grassmannian Mixture-of-Experts (GrMoE), a novel routing framework that enhances expert assignment in machine learning models by controlling sparsity and utilization through a concentration matrix on subspace manifolds.
Why It Matters
GrMoE addresses limitations in traditional Mixture-of-Experts models by providing a continuous mechanism for routing control, which can lead to improved model performance and interpretability. This innovation is significant for advancing machine learning techniques that require efficient resource allocation and expert utilization.
Key Takeaways
- GrMoE introduces a concentration matrix that controls routing entropy for expert assignment.
- The framework allows for uncertainty-aware expert assignment, reducing expert collapse.
- It achieves better load balance and perplexity compared to traditional models.
- The model supports post-hoc sparsity tuning without the need for retraining.
- Experts exhibit heterogeneous concentration values, indicating specialization in tasks.
Computer Science > Machine Learning arXiv:2602.17798 (cs) [Submitted on 19 Feb 2026] Title:Grassmannian Mixture-of-Experts: Concentration-Controlled Routing on Subspace Manifolds Authors:Ibne Farabi Shihab, Sanjeda Akter, Anuj Sharma View a PDF of the paper titled Grassmannian Mixture-of-Experts: Concentration-Controlled Routing on Subspace Manifolds, by Ibne Farabi Shihab and 2 other authors View PDF HTML (experimental) Abstract:Mixture-of-Experts models rely on learned routers to assign tokens to experts, yet standard softmax gating provides no principled mechanism to control the tradeoff between sparsity and utilization. We propose Grassmannian MoE (GrMoE), a routing framework that operates on the Grassmannian manifold of subspaces, where gating weights arise from the concentration parameters of Matrix Bingham distributions. This construction yields a single, interpretable knob -- the concentration matrix $\Lambda$ -- that continuously controls routing entropy, replacing discrete top-$k$ selection with a smooth, geometrically principled sparsity mechanism. We further develop an amortized variational inference procedure for posterior routing distributions, enabling uncertainty-aware expert assignment that naturally resists expert collapse. We formally prove tight bounds relating the Bingham concentration spectrum to routing entropy, expected top-$k$ mass, and an exponential bound on expert collapse, establishing the first formal theory of concentration-controlled sparsit...