[2602.24283] Taming Momentum: Rethinking Optimizer States Through

[2602.24283] Taming Momentum: Rethinking Optimizer States Through Low-Rank Approximation

arXiv - Machine Learning March 02, 2026 4 min read

About this article

Abstract page for arXiv paper 2602.24283: Taming Momentum: Rethinking Optimizer States Through Low-Rank Approximation

Computer Science > Machine Learning arXiv:2602.24283 (cs) [Submitted on 27 Feb 2026] Title:Taming Momentum: Rethinking Optimizer States Through Low-Rank Approximation Authors:Zhengbo Wang, Jian Liang, Ran He, Zilei Wang, Tieniu Tan View a PDF of the paper titled Taming Momentum: Rethinking Optimizer States Through Low-Rank Approximation, by Zhengbo Wang and 4 other authors View PDF HTML (experimental) Abstract:Modern optimizers like Adam and Muon are central to training large language models, but their reliance on first- and second-order momenta introduces significant memory overhead, which constrains scalability and computational efficiency. In this work, we reframe the exponential moving average (EMA) used in these momenta as the training of a linear regressor via online gradient flow. Building on this equivalence, we introduce LoRA-Pre, a novel low-rank optimizer designed for efficient pre-training. Specifically, LoRA-Pre reduces the optimizer's memory footprint by decomposing the full momentum matrix into a compact low-rank subspace within the online linear learner, thereby maintaining optimization performance while improving memory efficiency. We empirically validate LoRA-Pre's efficacy by pre-training models from the Llama architecture family, scaling from 60M to 1B parameters. LoRA-Pre achieves the highest performance across all model sizes. Notably, LoRA-Pre demonstrates remarkable rank efficiency, achieving comparable or superior results using only 1/8 the rank of...

Originally published on March 02, 2026. Curated by AI News.

Llms

How Claude Web tried to break out its container, provided all files on the system, scanned the networks, etc

Originally wasn't going to write about this - on one hand thought it's prolly already known, on the other hand I didn't feel like it was ...

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

Llms

Combining the robot operating system with LLMs for natural-language control

Over the past few decades, robotics researchers have developed a wide range of increasingly advanced robots that can autonomously complet...

Reddit - Artificial Intelligence · 1 min · about 4 hours ago

Llms

Which LLM is the best for writing a scientific paper?

I'll need to write a scientifc research paper for university. We're allowed and encouraged to use AI for our work. Be it for language or ...

Reddit - Artificial Intelligence · 1 min · about 6 hours ago

Llms

Anthropic is training Claude to recognize when its own tools are trying to manipulate it

One thing from Claude Code's source that I think is underappreciated. There's an explicit instruction in the system prompt: if the AI sus...

Reddit - Artificial Intelligence · 1 min · about 6 hours ago

[2602.24283] Taming Momentum: Rethinking Optimizer States Through Low-Rank Approximation

About this article

Related Articles

How Claude Web tried to break out its container, provided all files on the system, scanned the networks, etc

Combining the robot operating system with LLMs for natural-language control

Which LLM is the best for writing a scientific paper?

Anthropic is training Claude to recognize when its own tools are trying to manipulate it

No comments

Stay updated with AI News