[2604.04701] MUXQ: Mixed-to-Uniform Precision MatriX Quantization via Low-Rank Outlier Decomposition
About this article
Abstract page for arXiv paper 2604.04701: MUXQ: Mixed-to-Uniform Precision MatriX Quantization via Low-Rank Outlier Decomposition
Computer Science > Machine Learning arXiv:2604.04701 (cs) [Submitted on 6 Apr 2026] Title:MUXQ: Mixed-to-Uniform Precision MatriX Quantization via Low-Rank Outlier Decomposition Authors:Seoungsub Lee, In Seo Kim, Seon Wook Kim View a PDF of the paper titled MUXQ: Mixed-to-Uniform Precision MatriX Quantization via Low-Rank Outlier Decomposition, by Seoungsub Lee and 2 other authors View PDF HTML (experimental) Abstract:Large language models (LLMs) have achieved outstanding performance across a wide range of natural language processing tasks, but their enormous parameter counts impose ubstantial memory and computational overheads. This challenge is particularly critical in NPU-based on-device environments, where FP16/FP32 computation is inefficient and integer (INT) quantization is therefore essential. However, existing methods, including ZeroQuant, LLM.int8(), and SmoothQuant, do not fully address input-activation outliers and the associated hardware inefficiencies. To overcome these limitations, we propose MUXQ (Mixed-to-Uniform Quantization). MUXQ detects outlier channels in input activations and introduces a small auxiliary matrix that redistributes outlier magnitudes across channels, thereby alleviating the outlier problem. This enables even activation outliers to be quantized at low-precision INT levels while preserving a hardware-friendly computation structure. Experiments on GPT-2 models at three scales (0.1B, 0.3B, and 0.7B parameters) using the WikiText-2 dataset s...