[2602.10993] LoRA-Squeeze: Simple and Effective Post-Tuning and In-Tuning Compression of LoRA Modules
Summary
The paper introduces LoRA-Squeeze, a method for improving Low-Rank Adaptation (LoRA) by allowing dynamic rank adjustments during training, enhancing performance and efficiency in fine-tuning models.
Why It Matters
LoRA-Squeeze addresses key challenges in parameter-efficient fine-tuning by optimizing rank selection and deployment complexity. This innovation is crucial for advancing model efficiency in AI applications, particularly in NLP and computer vision, where resource constraints are significant.
Key Takeaways
- LoRA-Squeeze allows for dynamic rank adjustments during training.
- Post-hoc compression can outperform direct low-rank training.
- The method shows significant improvements across various tasks in NLP and vision.
Computer Science > Computation and Language arXiv:2602.10993 (cs) [Submitted on 11 Feb 2026 (v1), last revised 19 Feb 2026 (this version, v2)] Title:LoRA-Squeeze: Simple and Effective Post-Tuning and In-Tuning Compression of LoRA Modules Authors:Ivan Vulić, Adam Grycner, Quentin de Laroussilhe, Jonas Pfeiffer View a PDF of the paper titled LoRA-Squeeze: Simple and Effective Post-Tuning and In-Tuning Compression of LoRA Modules, by Ivan Vuli\'c and 3 other authors View PDF HTML (experimental) Abstract:Despite its huge number of variants, standard Low-Rank Adaptation (LoRA) is still a dominant technique for parameter-efficient fine-tuning (PEFT). Nonetheless, it faces persistent challenges, including the pre-selection of an optimal rank and rank-specific hyper-parameters, as well as the deployment complexity of heterogeneous-rank modules and more sophisticated LoRA derivatives. In this work, we introduce LoRA-Squeeze, a simple and efficient methodology that aims to improve standard LoRA learning by changing LoRA module ranks either post-hoc or dynamically during training}. Our approach posits that it is better to first learn an expressive, higher-rank solution and then compress it, rather than learning a constrained, low-rank solution directly. The method involves fine-tuning with a deliberately high(er) source rank, reconstructing or efficiently approximating the reconstruction of the full weight update matrix, and then using Randomized Singular Value Decomposition (RSVD) t...