Llms Machine Learning Ai Infrastructure

[2510.01650] The Unseen Frontier: Pushing the Limits of LLM Sparsity with Surrogate-Free ADMM

arXiv - AI February 24, 2026 4 min read Article

Summary

This paper presents a novel method, Elsa, for achieving extreme sparsity in large language models (LLMs) without sacrificing accuracy, addressing limitations in current pruning techniques.

Why It Matters

As LLMs grow in size and complexity, their computational demands increase, making efficient model pruning essential. This research offers a breakthrough in achieving up to 90% sparsity, which could significantly enhance the deployment of LLMs in resource-constrained environments.

Key Takeaways

Elsa achieves up to 90% sparsity in LLMs while maintaining high accuracy.
The method addresses limitations of traditional surrogate objective formulations.
Significant improvements in perplexity and inference speed are demonstrated.
Elsa's quantized variant can scale to extremely large models.
This research opens avenues for further advancements in model sparsity.

Computer Science > Machine Learning arXiv:2510.01650 (cs) [Submitted on 2 Oct 2025 (v1), last revised 23 Feb 2026 (this version, v2)] Title:The Unseen Frontier: Pushing the Limits of LLM Sparsity with Surrogate-Free ADMM Authors:Kwanhee Lee, Hyeondo Jang, Dongyeop Lee, Dan Alistarh, Namhoon Lee View a PDF of the paper titled The Unseen Frontier: Pushing the Limits of LLM Sparsity with Surrogate-Free ADMM, by Kwanhee Lee and 4 other authors View PDF HTML (experimental) Abstract:Neural network pruning is a promising technique to mitigate the excessive computational and memory requirements of large language models (LLMs). Despite its promise, however, progress in this area has diminished, as conventional methods are seemingly unable to surpass moderate sparsity levels (50-60%) without severely degrading model accuracy. This work breaks through the current impasse, presenting a principled and effective method called $\texttt{Elsa}$, which achieves extreme sparsity levels of up to 90% while retaining high model fidelity. This is done by identifying several limitations in current practice, all of which can be traced back to their reliance on a surrogate objective formulation. $\texttt{Elsa}$ tackles this issue directly and effectively via standard and well-established constrained optimization techniques based on ADMM. Our extensive experiments across a wide range of models and scales show that $\texttt{Elsa}$ achieves substantial improvements over existing methods; e.g., it achi...

Read Original Article