[2604.01472] The Newton-Muon Optimizer

arXiv - Machine Learning April 03, 2026 4 min read

About this article

Abstract page for arXiv paper 2604.01472: The Newton-Muon Optimizer

Mathematics > Optimization and Control arXiv:2604.01472 (math) [Submitted on 1 Apr 2026] Title:The Newton-Muon Optimizer Authors:Zhehang Du, Weijie Su View a PDF of the paper titled The Newton-Muon Optimizer, by Zhehang Du and Weijie Su View PDF HTML (experimental) Abstract:The Muon optimizer has received considerable attention for its strong performance in training large language models, yet the design principle behind its matrix-gradient orthogonalization remains largely elusive. In this paper, we introduce a surrogate model that not only sheds new light on the design of Muon, but more importantly leads to a new optimizer. In the same spirit as the derivation of Newton's method, the surrogate approximates the loss as a quadratic function of the perturbation to a weight matrix $W$ using only three matrices: the gradient $G$, an output-space curvature matrix $H$, and the data matrix $Z$ that stacks the layer inputs. By minimizing this surrogate in one step and adopting a certain isotropic assumption on the weights, we obtain the closed-form update rule (up to momentum and weight decay) $W \leftarrow W - \eta \cdot \mathrm{msgn}(G(ZZ^\top)^{-1})$, where $\eta$ is the learning rate and $\mathrm{msgn}(X)=UV^\top$ if $X=USV^\top$ is a compact singular value decomposition. This new optimization method, which we refer to as Newton-Muon, shows that standard Muon can be interpreted as an implicit Newton-type method that neglects the right preconditioning induced by the input secon...

Originally published on April 03, 2026. Curated by AI News.

Llms

Earnestly using Claude to create a shared drive hierarchy and manual maintenance plan = LOL

On a less serious (but perhaps profound?) note: Some guys I know recently decided to use AI for the first time in their lives, while sett...

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

Llms

OpenAI now lets teams make custom bots that can do work on their own | The Verge

OpenAI is bringing “workspace” AI agents to users of its Business, Enterprise, Edu, and Teachers plans that can perform business tasks in...

The Verge - AI · 4 min · about 3 hours ago

Llms

My Unsupervised Compliance Layer Project

A bit of context, my work has been mostly around building agentic pipelines. I really love the craft. My latest side project was a delibe...

Reddit - Artificial Intelligence · 1 min · about 4 hours ago

Llms

I’m 17 and built an AI that flirts, remembers you, watches your shows, and replies to your reels…

V3 is done and it’s getting… weird. This thing now: auto-replies to DMs with tone adjustment reads images, transcribes voice notes, repli...

Reddit - Artificial Intelligence · 1 min · about 4 hours ago

[2604.01472] The Newton-Muon Optimizer

About this article

Related Articles

Earnestly using Claude to create a shared drive hierarchy and manual maintenance plan = LOL

OpenAI now lets teams make custom bots that can do work on their own | The Verge

My Unsupervised Compliance Layer Project

I’m 17 and built an AI that flirts, remembers you, watches your shows, and replies to your reels…

No comments

Stay updated with AI News