[2601.14327] Yuan3.0 Ultra: A Trillion-Parameter Enterprise-Oriented MoE LLM
About this article
Abstract page for arXiv paper 2601.14327: Yuan3.0 Ultra: A Trillion-Parameter Enterprise-Oriented MoE LLM
Computer Science > Machine Learning arXiv:2601.14327 (cs) [Submitted on 20 Jan 2026 (v1), last revised 5 Mar 2026 (this version, v3)] Title:Yuan3.0 Ultra: A Trillion-Parameter Enterprise-Oriented MoE LLM Authors:YuanLab.ai: Shawn Wu, Jiangang Luo, Darcy Chen, Sean Wang, Louie Li, Allen Wang, Xudong Zhao, Tong Yu, Bach Li, Joseph Shen, Gawain Ma, Jasper Jia, Marcus Mao, Claire Wang, Hunter He, Carol Wang, Zera Zhang, Jason Wang, Chonly Shen, Leo Zhang, Logan Chen, Qasim Meng, James Gong, Daniel Zhao, Penn Zheng, Owen Zhu View a PDF of the paper titled Yuan3.0 Ultra: A Trillion-Parameter Enterprise-Oriented MoE LLM, by YuanLab.ai: Shawn Wu and 25 other authors View PDF HTML (experimental) Abstract:We introduce Yuan3.0 Ultra, an open-source Mixture-of-Experts (MoE) large language model featuring 68.8B activated parameters and 1010B total parameters, specially designed to enhance performance on enterprise scenarios tasks while maintaining competitive capabilities on general purpose tasks. We propose Layer-Adaptive Expert Pruning (LAEP) algorithm designed for the pre-training stage of MoE LLMs. In contrast to previous expert pruning approaches that operate primarily in the post-training phase, the proposed algorithm enhances training efficiency by selectively pruning underutilized experts and reorganizing experts across computing devices according to token distribution statistics. Comprehensive experiments demonstrate that LAEP effectively reduces model size and substantially i...