[2604.03258] SoLA: Leveraging Soft Activation Sparsity and Low-Rank Decomposition for Large Language Model Compression

[2604.03258] SoLA: Leveraging Soft Activation Sparsity and Low-Rank Decomposition for Large Language Model Compression

arXiv - AI 4 min read

About this article

Abstract page for arXiv paper 2604.03258: SoLA: Leveraging Soft Activation Sparsity and Low-Rank Decomposition for Large Language Model Compression

Computer Science > Computation and Language arXiv:2604.03258 (cs) [Submitted on 12 Mar 2026] Title:SoLA: Leveraging Soft Activation Sparsity and Low-Rank Decomposition for Large Language Model Compression Authors:Xinhao Huang, You-Liang Huang, Zeyi Wen View a PDF of the paper titled SoLA: Leveraging Soft Activation Sparsity and Low-Rank Decomposition for Large Language Model Compression, by Xinhao Huang and 2 other authors View PDF HTML (experimental) Abstract:Large language models (LLMs) have demonstrated impressive capabilities across various tasks, but the billion-scale parameters pose deployment challenges. Although existing methods attempt to reduce the scale of LLMs, they require either special hardware support or expensive post-training to maintain model quality. To facilitate efficient and affordable model slimming, we propose a novel training-free compression method for LLMs, named "SoLA", which leverages \textbf{So}ft activation sparsity and \textbf{L}ow-r\textbf{A}nk decomposition. SoLA can identify and retain a minority of components significantly contributing to inference, while compressing the majority through low-rank decomposition, based on our analysis of the activation pattern in the feed-forward network (FFN) of modern LLMs. To alleviate the decomposition loss, SoLA is equipped with an adaptive component-wise low-rank allocation strategy to assign appropriate truncation positions for different weight matrices. We conduct extensive experiments on LLaMA-2-...

Originally published on April 07, 2026. Curated by AI News.

Related Articles

Llms

The loss curve said tie. The judges said otherwise. Seeking replication for an early LLM training result [R]

TL;DR - I've written two novel functions that shape the training signal for LLMs. Early tests show people prefer responses from models tr...

Reddit - Machine Learning · 1 min ·
Llms

Karpathy dropped a 200-line GPT, so I used the math to turn pandas DataFrames into searchable context windows and open sourced it (and automated my stats pipeline). [P]

TL;DR: I got tired of manually running Shapiro-Wilk tests and copy-pasting p-values at 2 AM. I built an open-source, async Python pipelin...

Reddit - Machine Learning · 1 min ·
Llms

I built a solo AI platform from Algeria with no funding, no team and no ad spend - here's what's inside it after 2 months

Hello, 20 years old here just got into the Ai platform and launched this last two weeks and here is what I have on it so far. - Latest Ai...

Reddit - Artificial Intelligence · 1 min ·
USF murder suspect accused of using ChatGPT to research cover-up, prosecutors say
Llms

USF murder suspect accused of using ChatGPT to research cover-up, prosecutors say

Days after the remains of one of the two missing University of South Florida doctoral students were found, prosecutors say the suspect ma...

AI Tools & Products · 3 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime