[2604.03258] SoLA: Leveraging Soft Activation Sparsity and Low-Rank Decomposition for Large Language Model Compression
About this article
Abstract page for arXiv paper 2604.03258: SoLA: Leveraging Soft Activation Sparsity and Low-Rank Decomposition for Large Language Model Compression
Computer Science > Computation and Language arXiv:2604.03258 (cs) [Submitted on 12 Mar 2026] Title:SoLA: Leveraging Soft Activation Sparsity and Low-Rank Decomposition for Large Language Model Compression Authors:Xinhao Huang, You-Liang Huang, Zeyi Wen View a PDF of the paper titled SoLA: Leveraging Soft Activation Sparsity and Low-Rank Decomposition for Large Language Model Compression, by Xinhao Huang and 2 other authors View PDF HTML (experimental) Abstract:Large language models (LLMs) have demonstrated impressive capabilities across various tasks, but the billion-scale parameters pose deployment challenges. Although existing methods attempt to reduce the scale of LLMs, they require either special hardware support or expensive post-training to maintain model quality. To facilitate efficient and affordable model slimming, we propose a novel training-free compression method for LLMs, named "SoLA", which leverages \textbf{So}ft activation sparsity and \textbf{L}ow-r\textbf{A}nk decomposition. SoLA can identify and retain a minority of components significantly contributing to inference, while compressing the majority through low-rank decomposition, based on our analysis of the activation pattern in the feed-forward network (FFN) of modern LLMs. To alleviate the decomposition loss, SoLA is equipped with an adaptive component-wise low-rank allocation strategy to assign appropriate truncation positions for different weight matrices. We conduct extensive experiments on LLaMA-2-...