[2603.00541] Spectral Condition for $μ$P under Width-Depth Scaling

[2603.00541] Spectral Condition for $μ$P under Width-Depth Scaling

arXiv - Machine Learning 4 min read

About this article

Abstract page for arXiv paper 2603.00541: Spectral Condition for $μ$P under Width-Depth Scaling

Computer Science > Machine Learning arXiv:2603.00541 (cs) [Submitted on 28 Feb 2026] Title:Spectral Condition for $μ$P under Width-Depth Scaling Authors:Chenyu Zheng, Rongzhen Wang, Xinyu Zhang, Chongxuan Li View a PDF of the paper titled Spectral Condition for $\mu$P under Width-Depth Scaling, by Chenyu Zheng and 3 other authors View PDF HTML (experimental) Abstract:Generative foundation models are increasingly scaled in both width and depth, posing significant challenges for stable feature learning and reliable hyperparameter (HP) transfer across model sizes. While maximal update parameterization ($\mu$P) has provided a principled solution to both problems for width scaling, existing extensions to the joint width-depth scaling regime remain fragmented, architecture- and optimizer-specific, and often rely on technically involved theories. In this work, we develop a simple and unified spectral framework for $\mu$P under joint width-depth scaling. Considering residual networks of varying block depths, we first introduce a spectral $\mu$P condition that precisely characterizes how the norms of weights and their per-step updates should scale with width and depth, unifying previously disparate $\mu$P formulations as special cases. Building on this condition, we then derive a general recipe for implementing $\mu$P across a broad class of optimizers by mapping the spectral constraints to concrete HP parameterizations. This approach not only recovers existing $\mu$P formulations ...

Originally published on March 03, 2026. Curated by AI News.

Related Articles

Llms

[For Hire] Junior AI/ML Engineer | RAG · LLMs · FastAPI · Vector DBs | Remote

Posting this for a friend who isn't on Reddit. A recent graduate, entry level, no commercial production experience but spent the past yea...

Reddit - ML Jobs · 1 min ·
I Asked ChatGPT What WIRED’s Reviewers Recommend—Its Answers Were All Wrong | WIRED
Llms

I Asked ChatGPT What WIRED’s Reviewers Recommend—Its Answers Were All Wrong | WIRED

Want to know what our reviewers have actually tested and picked as the best TVs, headphones, and laptops? Ask ChatGPT, and it'll give you...

Wired - AI · 8 min ·
A Cross-Sectional Study Evaluating the Quality of AI-Generated Patient Education Guides on Diet and Exercise for Diabetes, Hypertension, and Obesity Using ChatGPT-4o, Google Gemini 1.5, Claude Sonnet 4, Perplexity, and Grok
Llms

A Cross-Sectional Study Evaluating the Quality of AI-Generated Patient Education Guides on Diet and Exercise for Diabetes, Hypertension, and Obesity Using ChatGPT-4o, Google Gemini 1.5, Claude Sonnet 4, Perplexity, and Grok

This study evaluates the quality of AI-generated patient education guides on diet and exercise for chronic conditions, comparing five lan...

AI Tools & Products · 2 min ·
Llms

Agents Can Now Propose and Deploy Their Own Code Changes

150 clones yesterday. 43 stars in 3 days. Every agent framework you've used (LangChain, LangGraph, Claude Code) assumes agents are tools ...

Reddit - Artificial Intelligence · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime