[2602.24283] Taming Momentum: Rethinking Optimizer States Through Low-Rank Approximation

[2602.24283] Taming Momentum: Rethinking Optimizer States Through Low-Rank Approximation

arXiv - Machine Learning 4 min read

About this article

Abstract page for arXiv paper 2602.24283: Taming Momentum: Rethinking Optimizer States Through Low-Rank Approximation

Computer Science > Machine Learning arXiv:2602.24283 (cs) [Submitted on 27 Feb 2026] Title:Taming Momentum: Rethinking Optimizer States Through Low-Rank Approximation Authors:Zhengbo Wang, Jian Liang, Ran He, Zilei Wang, Tieniu Tan View a PDF of the paper titled Taming Momentum: Rethinking Optimizer States Through Low-Rank Approximation, by Zhengbo Wang and 4 other authors View PDF HTML (experimental) Abstract:Modern optimizers like Adam and Muon are central to training large language models, but their reliance on first- and second-order momenta introduces significant memory overhead, which constrains scalability and computational efficiency. In this work, we reframe the exponential moving average (EMA) used in these momenta as the training of a linear regressor via online gradient flow. Building on this equivalence, we introduce LoRA-Pre, a novel low-rank optimizer designed for efficient pre-training. Specifically, LoRA-Pre reduces the optimizer's memory footprint by decomposing the full momentum matrix into a compact low-rank subspace within the online linear learner, thereby maintaining optimization performance while improving memory efficiency. We empirically validate LoRA-Pre's efficacy by pre-training models from the Llama architecture family, scaling from 60M to 1B parameters. LoRA-Pre achieves the highest performance across all model sizes. Notably, LoRA-Pre demonstrates remarkable rank efficiency, achieving comparable or superior results using only 1/8 the rank of...

Originally published on March 02, 2026. Curated by AI News.

Related Articles

Llms

How Claude Web tried to break out its container, provided all files on the system, scanned the networks, etc

Originally wasn't going to write about this - on one hand thought it's prolly already known, on the other hand I didn't feel like it was ...

Reddit - Artificial Intelligence · 1 min ·
Llms

Combining the robot operating system with LLMs for natural-language control

Over the past few decades, robotics researchers have developed a wide range of increasingly advanced robots that can autonomously complet...

Reddit - Artificial Intelligence · 1 min ·
Llms

Which LLM is the best for writing a scientific paper?

I'll need to write a scientifc research paper for university. We're allowed and encouraged to use AI for our work. Be it for language or ...

Reddit - Artificial Intelligence · 1 min ·
Llms

Anthropic is training Claude to recognize when its own tools are trying to manipulate it

One thing from Claude Code's source that I think is underappreciated. There's an explicit instruction in the system prompt: if the AI sus...

Reddit - Artificial Intelligence · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime