[2603.03597] NuMuon: Nuclear-Norm-Constrained Muon for Compressible LLM Training
About this article
Abstract page for arXiv paper 2603.03597: NuMuon: Nuclear-Norm-Constrained Muon for Compressible LLM Training
Computer Science > Machine Learning arXiv:2603.03597 (cs) [Submitted on 4 Mar 2026] Title:NuMuon: Nuclear-Norm-Constrained Muon for Compressible LLM Training Authors:Hadi Mohaghegh Dolatabadi, Thalaiyasingam Ajanthan, Sameera Ramasinghe, Chamin P Hewa Koneputugodage, Shamane Siriwardhana, Violetta Shevchenko, Karol Pajak, James Snewin, Gil Avraham, Alexander Long View a PDF of the paper titled NuMuon: Nuclear-Norm-Constrained Muon for Compressible LLM Training, by Hadi Mohaghegh Dolatabadi and 9 other authors View PDF HTML (experimental) Abstract:The rapid progress of large language models (LLMs) is increasingly constrained by memory and deployment costs, motivating compression methods for practical deployment. Many state-of-the-art compression pipelines leverage the low-rank structure of trained weight matrices, a phenomenon often associated with the properties of popular optimizers such as Adam. In this context, Muon is a recently proposed optimizer that improves LLM pretraining via full-rank update steps, but its induced weight-space structure has not been characterized yet. In this work, we report a surprising empirical finding: despite imposing full-rank updates, Muon-trained models exhibit pronounced low-rank structure in their weight matrices and are readily compressible under standard pipelines. Motivated by this insight, we propose NuMuon, which augments Muon with a nuclear-norm constraint on the update direction, further constraining the learned weights toward lo...