[2602.15563] 1-Bit Wonder: Improving QAT Performance in the Low-Bit Regime through K-Means Quantization
Summary
The paper presents a study on quantization-aware training (QAT) for low-bit quantization, demonstrating that k-means based weight quantization significantly enhances performance on generative tasks compared to traditional integer formats.
Why It Matters
As machine learning models grow in size, efficient quantization methods are crucial for reducing memory usage without sacrificing performance. This research provides insights into optimizing QAT in low-bit regimes, which is vital for deploying large language models in resource-constrained environments.
Key Takeaways
- K-means based weight quantization outperforms traditional integer formats.
- 1-bit quantized weights yield the best performance under fixed memory budgets.
- The study addresses gaps in understanding the trade-offs of quantization in QAT.
- Empirical results provide a clearer picture of quantization impacts on generative tasks.
- Optimizing quantization methods is essential for efficient deployment of large models.
Computer Science > Machine Learning arXiv:2602.15563 (cs) [Submitted on 17 Feb 2026] Title:1-Bit Wonder: Improving QAT Performance in the Low-Bit Regime through K-Means Quantization Authors:Sohir Maskey, Constantin Eichenberg, Johannes Messner, Douglas Orr View a PDF of the paper titled 1-Bit Wonder: Improving QAT Performance in the Low-Bit Regime through K-Means Quantization, by Sohir Maskey and 3 other authors View PDF HTML (experimental) Abstract:Quantization-aware training (QAT) is an effective method to drastically reduce the memory footprint of LLMs while keeping performance degradation at an acceptable level. However, the optimal choice of quantization format and bit-width presents a challenge in practice. The full design space of quantization is not fully explored in the context of QAT, and the precise trade-off between quantization and downstream performance is poorly understood, as comparisons often rely solely on perplexity-based evaluations. In this work, we address these shortcomings with an empirical study of QAT in the low-bit regime. We show that k-means based weight quantization outperforms integer formats and can be implemented efficiently on standard hardware. Furthermore, we find that, under a fixed inference memory budget, the best performance on generative downstream tasks is achieved with $1$-bit quantized weights. Comments: Subjects: Machine Learning (cs.LG) Cite as: arXiv:2602.15563 [cs.LG] (or arXiv:2602.15563v1 [cs.LG] for this version) https...