[2506.11087] Enhancing Delta Compression in LLMs via SVD-based Quantization Error Minimization
Summary
This article presents PrinMix, a new SVD-based framework for enhancing delta compression in large language models (LLMs), addressing storage and distribution challenges through optimized quantization techniques.
Why It Matters
As LLMs become increasingly complex, efficient storage and transmission of their parameters are critical. The proposed PrinMix framework offers a mathematically grounded approach to quantization, potentially improving performance and resource management in AI applications.
Key Takeaways
- PrinMix improves delta compression in LLMs using SVD-based methods.
- The framework models quantization as an optimization problem, enhancing generalizability.
- Experimental results show PrinMix outperforms existing methods on key benchmarks.
Computer Science > Machine Learning arXiv:2506.11087 (cs) [Submitted on 5 Jun 2025 (v1), last revised 15 Feb 2026 (this version, v3)] Title:Enhancing Delta Compression in LLMs via SVD-based Quantization Error Minimization Authors:Boya Xiong, Shuo Wang, Weifeng Ge, Guanhua Chen, Yun Chen View a PDF of the paper titled Enhancing Delta Compression in LLMs via SVD-based Quantization Error Minimization, by Boya Xiong and 4 other authors View PDF HTML (experimental) Abstract:Supervised Fine-Tuning (SFT) empowers Large Language Models (LLMs) with exceptional performance on specialized tasks, but it yields dense, high-dimensional delta parameters that pose severe storage and distribution challenges. Singular Value Decomposition (SVD)-based compression offers a compact representation for such delta parameters, but existing methods adopt heuristic quantization without clarifying underlying mechanisms, leading to poor generalizability. In this work, we propose PrinMix, a rigorous SVD-based framework that models quantization as an optimization problem, grounding the design in mathematical mechanisms. We first theoretically derive quantization error and identify a key singular-value-dominated scaling mechanism, which mathematically proves the necessity of mix-precision quantization. We then model the quantization scheme as a 0/1 Integer Linear Programming (ILP) problem, which yields optimal bit-budget-constrained solutions without empirical assumptions. Furthermore, PrinMix integrates ...