[2603.22370] FAAR: Format-Aware Adaptive Rounding for NVFP4
About this article
Abstract page for arXiv paper 2603.22370: FAAR: Format-Aware Adaptive Rounding for NVFP4
Computer Science > Machine Learning arXiv:2603.22370 (cs) [Submitted on 23 Mar 2026] Title:FAAR: Format-Aware Adaptive Rounding for NVFP4 Authors:Hanglin Li, Shuchang Tian, Chen Lin, Zhiyong Zhao, Kun Zhan View a PDF of the paper titled FAAR: Format-Aware Adaptive Rounding for NVFP4, by Hanglin Li and 4 other authors View PDF HTML (experimental) Abstract:Deploying large language models (LLMs) on edge devices requires extremely low-bit quantization. Ultra-low precision formats such as NVFP4 offer a promising solution for reducing memory footprint and accelerating computation. However, existing quantization methods typically rely on conventional rounding strategies and fail to account for the non-uniformity of the NVFP4 numerical grid, resulting in suboptimal rounding decisions and amplified quantization errors. To address this, we propose Format-Aware Adaptive Rounding (FAAR), a learnable rounding strategy tailored for the NVFP4 format. Unlike conventional quantization paradigms, FAAR explicitly incorporates the non-uniform NVFP4 grid into the optimization process. By adaptively adjusting rounding decisions guided by loss gradients, our method effectively approximates the theoretically optimal quantization. To complement FAAR, we introduce a 2-stages Format Alignment (2FA) fine-tuning scheme that aligns LLM parameters layer-by-layer to the NVFP4 numerical space, further narrowing the performance gap. Remarkably, this learnable optimization incurs a minimal training overhead...