[2510.18259] Learning under Quantization for High-Dimensional Linear Regression
Summary
This paper explores the impact of low-bit quantization on high-dimensional linear regression, providing a theoretical framework for understanding its effects on learning performance.
Why It Matters
As machine learning models grow in complexity, efficient training techniques like quantization become crucial. This study offers insights into how different quantization methods affect learning dynamics, which can inform better model training under hardware constraints.
Key Takeaways
- Quantization can amplify noise during training, affecting learning outcomes.
- Different quantization schemes (additive vs. multiplicative) have distinct impacts on data distortion and noise amplification.
- The study provides theoretical bounds that characterize the risks associated with various quantization methods.
- Understanding quantization's effects can lead to improved optimization algorithms.
- This research lays the groundwork for further exploration of learning theory in practical hardware contexts.
Statistics > Machine Learning arXiv:2510.18259 (stat) [Submitted on 21 Oct 2025 (v1), last revised 16 Feb 2026 (this version, v3)] Title:Learning under Quantization for High-Dimensional Linear Regression Authors:Dechen Zhang, Junwei Su, Difan Zou View a PDF of the paper titled Learning under Quantization for High-Dimensional Linear Regression, by Dechen Zhang and 2 other authors View PDF Abstract:The use of low-bit quantization has emerged as an indispensable technique for enabling the efficient training of large-scale models. Despite its widespread empirical success, a rigorous theoretical understanding of its impact on learning performance remains notably absent, even in the simplest linear regression setting. We present the first systematic theoretical study of this fundamental question, analyzing finite-step stochastic gradient descent (SGD) for high-dimensional linear regression under a comprehensive range of quantization targets: data, label, parameter, activation, and gradient. Our novel analytical framework establishes precise algorithm-dependent and data-dependent excess risk bounds that characterize how different quantization affects learning: parameter, activation, and gradient quantization amplify noise during training; data quantization distorts the data spectrum and introduces additional approximation error. Crucially, we distinguish the effects of two quantization schemes: we prove that for additive quantization (with constant quantization steps), the noise ...