[2602.14039] Geometry-Preserving Aggregation for Mixture-of-Experts Embedding Models
Summary
The paper presents Spherical Barycentric Aggregation (SBA), a new method for aggregating outputs in Mixture-of-Experts (MoE) embedding models, addressing geometric inconsistencies in traditional linear aggregation methods.
Why It Matters
This research is significant as it highlights the limitations of current aggregation techniques in MoE models, which can distort embedding representations. By introducing SBA, the authors provide a solution that preserves the geometric structure of embeddings, potentially improving performance in various NLP tasks.
Key Takeaways
- Traditional linear aggregation in MoE models can distort embeddings.
- Spherical Barycentric Aggregation (SBA) maintains geometric consistency.
- SBA separates radial and angular components for better performance.
- Experiments show SBA improves results in semantic similarity and clustering tasks.
- Geometric awareness in aggregation is crucial for MoE architectures.
Computer Science > Computation and Language arXiv:2602.14039 (cs) [Submitted on 15 Feb 2026] Title:Geometry-Preserving Aggregation for Mixture-of-Experts Embedding Models Authors:Sajjad Kachuee, Mohammad Sharifkhani View a PDF of the paper titled Geometry-Preserving Aggregation for Mixture-of-Experts Embedding Models, by Sajjad Kachuee and 1 other authors View PDF HTML (experimental) Abstract:Mixture-of-Experts (MoE) embedding models combine expert outputs using weighted linear summation, implicitly assuming a linear subspace structure in the embedding space. This assumption is shown to be inconsistent with the geometry of expert representations. Geometric analysis of a modern MoE embedding model reveals that expert outputs lie on a shared hyperspherical manifold characterized by tightly concentrated norms and substantial angular separation. Under this geometry, linear aggregation induces inward collapse toward the manifold interior, distorting vector magnitude and direction and reducing embedding comparability. To address this inconsistency, Spherical Barycentric Aggregation (SBA) is introduced as a geometry-preserving aggregation operator that separates radial and angular components to maintain hyperspherical structure while remaining fully compatible with existing routing mechanisms. Experiments on selected tasks from the Massive Text Embedding Benchmark (MTEB), including semantic similarity, clustering, and duplicate question detection, demonstrate consistent performan...