[2509.23202] Bridging the Gap Between Promise and Performance for Microscaling FP4 Quantization
About this article
Abstract page for arXiv paper 2509.23202: Bridging the Gap Between Promise and Performance for Microscaling FP4 Quantization
Computer Science > Machine Learning arXiv:2509.23202 (cs) [Submitted on 27 Sep 2025 (v1), last revised 3 Mar 2026 (this version, v3)] Title:Bridging the Gap Between Promise and Performance for Microscaling FP4 Quantization Authors:Vage Egiazarian, Roberto L. Castro, Denis Kuznedelev, Andrei Panferov, Eldar Kurtic, Shubhra Pandit, Alexandre Marques, Mark Kurtz, Saleh Ashkboos, Torsten Hoefler, Dan Alistarh View a PDF of the paper titled Bridging the Gap Between Promise and Performance for Microscaling FP4 Quantization, by Vage Egiazarian and 9 other authors View PDF HTML (experimental) Abstract:The recent hardware-accelerated microscaling 4-bit floating-point formats such as MXFP4 and NVFP4, supported on NVIDIA and AMD GPUs, promise to revolutionize large language model (LLM) inference. Yet, their practical benefits remain unproven. We present the first comprehensive study of MXFP4 and NVFP4 for post-training quantization, revealing gaps between their promise and real-world performance. Our analysis shows that state-of-the-art methods struggle with FP4, due to two key issues: (1) NVFP4's small group size provably neutralizes traditional outlier mitigation techniques; (2) MXFP4's power-of-two scale quantization severely degrades accuracy due to high induced error. To bridge this gap, we introduce Micro-Rotated-GPTQ (MR-GPTQ), a variant of the classic GPTQ quantization algorithm that tailors the quantization process to FP4's unique properties, by using block-wise Hadamard tra...