Llms Machine Learning Ai Infrastructure

[2402.15751] Sparse MeZO: Less Parameters for Better Performance in Zeroth-Order LLM Fine-Tuning

arXiv - AI February 17, 2026 4 min read Article

Summary

The paper introduces Sparse MeZO, a novel optimization technique for fine-tuning large language models (LLMs) that reduces memory usage while improving performance and convergence speed.

Why It Matters

As large language models become increasingly prevalent, optimizing their fine-tuning processes is crucial for efficiency and performance. Sparse MeZO addresses memory inefficiencies and enhances convergence, making it a significant advancement in the field of machine learning.

Key Takeaways

Sparse MeZO optimizes memory usage by applying zeroth-order optimization selectively.
The technique improves performance and convergence speed compared to traditional methods.
Experimental results show a 9% accuracy improvement and 3.5x speedup on specific tasks.

Computer Science > Machine Learning arXiv:2402.15751 (cs) [Submitted on 24 Feb 2024 (v1), last revised 16 Feb 2026 (this version, v2)] Title:Sparse MeZO: Less Parameters for Better Performance in Zeroth-Order LLM Fine-Tuning Authors:Yong Liu, Zirui Zhu, Chaoyu Gong, Minhao Cheng, Cho-Jui Hsieh, Yang You View a PDF of the paper titled Sparse MeZO: Less Parameters for Better Performance in Zeroth-Order LLM Fine-Tuning, by Yong Liu and 4 other authors View PDF HTML (experimental) Abstract:While fine-tuning large language models (LLMs) for specific tasks often yields impressive results, it comes at the cost of memory inefficiency due to back-propagation in gradient-based training. Memory-efficient Zeroth-order (MeZO) optimizers, recently proposed to address this issue, only require forward passes during training, making them more memory-friendly. However, compared with exact gradients, ZO-based gradients usually exhibit an estimation error, which can significantly hurt the optimization process, leading to slower convergence and suboptimal solutions. In addition, we find that the estimation error will hurt more when adding to large weights instead of small weights. Based on this observation, this paper introduces Sparse MeZO, a novel memory-efficient zeroth-order optimization approach that applies ZO only to a carefully chosen subset of parameters. We propose a simple yet effective parameter selection scheme that yields significant performance gains with Sparse-MeZO. Additional...

Read Original Article

[2402.15751] Sparse MeZO: Less Parameters for Better Performance in Zeroth-Order LLM Fine-Tuning

Summary

Why It Matters

Key Takeaways

Related Articles

Anyone here using local models mainly to keep LLM costs under control?

Claude AI Goes Down for Thousands of Users Wednesday, Downdetector Reports

ChatGPT meets coffee: Starbucks launches AI ordering tool

NFL mock draft 2026: ChatGPT AI gives the worst predictions you'll ever see

No comments

Stay updated with AI News