[2604.01563] Does Your Optimizer Care How You Normalize? Normalization-Optimizer Coupling in LLM Training

[2604.01563] Does Your Optimizer Care How You Normalize? Normalization-Optimizer Coupling in LLM Training

arXiv - Machine Learning 3 min read

About this article

Abstract page for arXiv paper 2604.01563: Does Your Optimizer Care How You Normalize? Normalization-Optimizer Coupling in LLM Training

Computer Science > Artificial Intelligence arXiv:2604.01563 (cs) [Submitted on 2 Apr 2026] Title:Does Your Optimizer Care How You Normalize? Normalization-Optimizer Coupling in LLM Training Authors:Abdelrahman Abouzeid (Georgia Institute of Technology) View a PDF of the paper titled Does Your Optimizer Care How You Normalize? Normalization-Optimizer Coupling in LLM Training, by Abdelrahman Abouzeid (Georgia Institute of Technology) View PDF HTML (experimental) Abstract:In LLM training, normalization layers and optimizers are typically treated as independent design choices. In a 3x2 factorial at 1B parameters and 1000 training steps, we show this assumption can fail: Dynamic Erf (Derf; Chen & Liu, 2025) suffers a large negative interaction with Muon (Jordan, 2024), with its gap to RMSNorm growing from +0.31 nats under AdamW to +0.97 under Muon, approximately three times larger. Dynamic Tanh (DyT; Zhu et al., 2025), included as a bounded-normalizer control, shows no such penalty. Our evidence points to two failure modes of erf under Muon's faster spectral-norm growth: saturation (lossy compression) and scale blindness (discarding activation magnitude). An EMA-blend that reintroduces running scale estimates recovers ~84% of the gap. Separately, reducing Derf's alpha from its published default (0.5 to 0.3) recovers ~80% by keeping erf in its near-linear regime, where it approximately preserves relative scale; this setting is not the published default of Chen & Liu (2025). Usin...

Originally published on April 03, 2026. Curated by AI News.

Related Articles

Llms

Earnestly using Claude to create a shared drive hierarchy and manual maintenance plan = LOL

On a less serious (but perhaps profound?) note: Some guys I know recently decided to use AI for the first time in their lives, while sett...

Reddit - Artificial Intelligence · 1 min ·
OpenAI now lets teams make custom bots that can do work on their own | The Verge
Llms

OpenAI now lets teams make custom bots that can do work on their own | The Verge

OpenAI is bringing “workspace” AI agents to users of its Business, Enterprise, Edu, and Teachers plans that can perform business tasks in...

The Verge - AI · 4 min ·
Llms

My Unsupervised Compliance Layer Project

A bit of context, my work has been mostly around building agentic pipelines. I really love the craft. My latest side project was a delibe...

Reddit - Artificial Intelligence · 1 min ·
Llms

I’m 17 and built an AI that flirts, remembers you, watches your shows, and replies to your reels…

V3 is done and it’s getting… weird. This thing now: auto-replies to DMs with tone adjustment reads images, transcribes voice notes, repli...

Reddit - Artificial Intelligence · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime