[2402.15751] Sparse MeZO: Less Parameters for Better Performance in Zeroth-Order LLM Fine-Tuning
Summary
The paper introduces Sparse MeZO, a novel optimization technique for fine-tuning large language models (LLMs) that reduces memory usage while improving performance and convergence speed.
Why It Matters
As large language models become increasingly prevalent, optimizing their fine-tuning processes is crucial for efficiency and performance. Sparse MeZO addresses memory inefficiencies and enhances convergence, making it a significant advancement in the field of machine learning.
Key Takeaways
- Sparse MeZO optimizes memory usage by applying zeroth-order optimization selectively.
- The technique improves performance and convergence speed compared to traditional methods.
- Experimental results show a 9% accuracy improvement and 3.5x speedup on specific tasks.
Computer Science > Machine Learning arXiv:2402.15751 (cs) [Submitted on 24 Feb 2024 (v1), last revised 16 Feb 2026 (this version, v2)] Title:Sparse MeZO: Less Parameters for Better Performance in Zeroth-Order LLM Fine-Tuning Authors:Yong Liu, Zirui Zhu, Chaoyu Gong, Minhao Cheng, Cho-Jui Hsieh, Yang You View a PDF of the paper titled Sparse MeZO: Less Parameters for Better Performance in Zeroth-Order LLM Fine-Tuning, by Yong Liu and 4 other authors View PDF HTML (experimental) Abstract:While fine-tuning large language models (LLMs) for specific tasks often yields impressive results, it comes at the cost of memory inefficiency due to back-propagation in gradient-based training. Memory-efficient Zeroth-order (MeZO) optimizers, recently proposed to address this issue, only require forward passes during training, making them more memory-friendly. However, compared with exact gradients, ZO-based gradients usually exhibit an estimation error, which can significantly hurt the optimization process, leading to slower convergence and suboptimal solutions. In addition, we find that the estimation error will hurt more when adding to large weights instead of small weights. Based on this observation, this paper introduces Sparse MeZO, a novel memory-efficient zeroth-order optimization approach that applies ZO only to a carefully chosen subset of parameters. We propose a simple yet effective parameter selection scheme that yields significant performance gains with Sparse-MeZO. Additional...