[2602.13659] Zero-Order Optimization for LLM Fine-Tuning via Learnable Direction Sampling

[2602.13659] Zero-Order Optimization for LLM Fine-Tuning via Learnable Direction Sampling

arXiv - Machine Learning 4 min read Article

Summary

This article presents a novel zero-order optimization framework for fine-tuning large language models (LLMs) using learnable direction sampling, addressing memory constraints in resource-limited environments.

Why It Matters

As the demand for fine-tuning large pretrained language models increases, traditional methods face limitations due to high memory usage. This research proposes a more efficient approach that could enable broader deployment of LLMs in various applications, particularly where resources are constrained.

Key Takeaways

  • Introduces a zero-order optimization framework that reduces memory demands for LLM fine-tuning.
  • Utilizes learnable direction sampling to improve the quality of gradient information.
  • Demonstrates improved performance on LLM fine-tuning benchmarks compared to standard methods.
  • Addresses the high variance and dimensionality issues associated with classical zero-order methods.
  • Provides a theoretical analysis supporting the effectiveness of the proposed approach.

Computer Science > Machine Learning arXiv:2602.13659 (cs) [Submitted on 14 Feb 2026] Title:Zero-Order Optimization for LLM Fine-Tuning via Learnable Direction Sampling Authors:Valery Parfenov, Grigoriy Evseev, Andrey Veprikov, Nikolay Bushkov, Stanislav Moiseev, Aleksandr Beznosikov View a PDF of the paper titled Zero-Order Optimization for LLM Fine-Tuning via Learnable Direction Sampling, by Valery Parfenov and 5 other authors View PDF HTML (experimental) Abstract:Fine-tuning large pretrained language models (LLMs) is a cornerstone of modern NLP, yet its growing memory demands (driven by backpropagation and large optimizer States) limit deployment in resource-constrained settings. Zero-order (ZO) methods bypass backpropagation by estimating directional derivatives from forward evaluations, offering substantial memory savings. However, classical ZO estimators suffer from high variance and an adverse dependence on the parameter dimensionality $d$, which has constrained their use to low-dimensional problems. In this work, we propose a policy-driven ZO framework that treats the sampling distribution over perturbation directions as a learnable policy and updates it to reduce the variance of directional estimates. We develop a practical algorithm implementing this idea and provide a theoretical analysis, showing that learned sampling distributions improve the quality of gradient information and relax the explicit dependence on $d$ in convergence bounds. Empirically, we validate...

Related Articles

Llms

[R] Looking for arXiv cs.LG endorser, inference monitoring using information geometry

Hi r/MachineLearning, I’m looking for an arXiv endorser in cs.LG for a paper on inference-time distribution shift detection for deployed ...

Reddit - Machine Learning · 1 min ·
Llms

How LLM sycophancy got the US into the Iran quagmire

submitted by /u/sow_oats [link] [comments]

Reddit - Artificial Intelligence · 1 min ·
Llms

Kept hitting ChatGPT and Claude limits during real work. This is the free setup I ended up using

I do a lot of writing and random problem solving for work. Mostly long drafts, edits, and breaking down ideas. Around Jan I kept hitting ...

Reddit - Artificial Intelligence · 1 min ·
Llms

Is ChatGPT changing the way we think too much already?

Back in the day, I got ChatGPT Plus mostly for work and to help me write better and do stuff faster. But now I use it for almost everythi...

Reddit - Artificial Intelligence · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime