[2603.25284] SliderQuant: Accurate Post-Training Quantization for LLMs
About this article
Abstract page for arXiv paper 2603.25284: SliderQuant: Accurate Post-Training Quantization for LLMs
Computer Science > Artificial Intelligence arXiv:2603.25284 (cs) [Submitted on 26 Mar 2026] Title:SliderQuant: Accurate Post-Training Quantization for LLMs Authors:Shigeng Wang, Chao Li, Yangyuxuan Kang, Jiawei Fan, Zhonghong Ou, Anbang Yao View a PDF of the paper titled SliderQuant: Accurate Post-Training Quantization for LLMs, by Shigeng Wang and 5 other authors View PDF HTML (experimental) Abstract:In this paper, we address post-training quantization (PTQ) for large language models (LLMs) from an overlooked perspective: given a pre-trained high-precision LLM, the predominant sequential quantization framework treats different layers equally, but this may be not optimal in challenging bit-width settings. We empirically study the quantization impact of different layers on model accuracy, and observe that: (1) shallow/deep layers are usually more sensitive to quantization than intermediate layers; (2) among shallow/deep layers, the most sensitive one is the first/last layer, which exhibits significantly larger quantization error than others. These empirical observations imply that the quantization design for different layers of LLMs is required on multiple levels instead of a single level shared to all layers. Motivated by this, we propose a new PTQ framework termed Sliding-layer Quantization (SliderQuant) that relies on a simple adaptive sliding quantization concept facilitated by few learnable parameters. The base component of SliderQuant is called inter-layer sliding qua...