[2510.05077] Slm-mux: Orchestrating small language models for reasoning
Summary
The paper presents SLM-MUX, a novel architecture for orchestrating small language models (SLMs) to improve reasoning accuracy, achieving significant performance gains over existing methods.
Why It Matters
As the use of small language models increases, optimizing their orchestration can lead to better efficiency and accuracy in AI applications. This research addresses a critical gap in current methodologies, providing a framework that enhances the capabilities of SLMs, which are often overlooked in favor of larger models. The findings could influence future developments in AI systems, particularly in resource-constrained environments.
Key Takeaways
- SLM-MUX orchestrates multiple small language models for improved reasoning.
- The proposed method shows up to 13.4% accuracy improvement on specific benchmarks.
- SLM-MUX outperforms larger models like Qwen 2.5 72B in certain tasks with just two SLMs.
- The architecture is adaptable to various model classes, enhancing its applicability.
- The research provides theoretical backing for the effectiveness of the proposed orchestration method.
Computer Science > Computation and Language arXiv:2510.05077 (cs) [Submitted on 6 Oct 2025 (v1), last revised 25 Feb 2026 (this version, v2)] Title:Slm-mux: Orchestrating small language models for reasoning Authors:Chenyu Wang, Zishen Wan, Hao Kang, Emma Chen, Zhiqiang Xie, Tushar Krishna, Vijay Janapa Reddi, Yilun Du View a PDF of the paper titled Slm-mux: Orchestrating small language models for reasoning, by Chenyu Wang and 7 other authors View PDF HTML (experimental) Abstract:With the rapid development of language models, the number of small language models (SLMs) has grown significantly. Although they do not achieve state-of-the-art accuracy, they are more efficient and often excel at specific tasks. This raises a natural question: can multiple SLMs be orchestrated into a system where each contributes effectively, achieving higher accuracy than any individual model? Existing orchestration methods have primarily targeted frontier models (e.g., GPT-4) and perform suboptimally when applied to SLMs. To address this gap, we propose a three-stage approach for orchestrating SLMs. First, we introduce SLM-MUX, a multi-model architecture that effectively coordinates multiple SLMs. Building on this, we develop two optimization strategies: (i) a model selection search that identifies the most complementary SLMs from a given pool, and (ii) test-time scaling tailored to SLM-MUX. Our approach delivers strong results: Compared to existing orchestration methods, our approach achieves u...