[2603.00541] Spectral Condition for $μ$P under Width-Depth Scaling

arXiv - Machine Learning March 03, 2026 4 min read

About this article

Abstract page for arXiv paper 2603.00541: Spectral Condition for $μ$P under Width-Depth Scaling

Computer Science > Machine Learning arXiv:2603.00541 (cs) [Submitted on 28 Feb 2026] Title:Spectral Condition for $μ$P under Width-Depth Scaling Authors:Chenyu Zheng, Rongzhen Wang, Xinyu Zhang, Chongxuan Li View a PDF of the paper titled Spectral Condition for $\mu$P under Width-Depth Scaling, by Chenyu Zheng and 3 other authors View PDF HTML (experimental) Abstract:Generative foundation models are increasingly scaled in both width and depth, posing significant challenges for stable feature learning and reliable hyperparameter (HP) transfer across model sizes. While maximal update parameterization ($\mu$P) has provided a principled solution to both problems for width scaling, existing extensions to the joint width-depth scaling regime remain fragmented, architecture- and optimizer-specific, and often rely on technically involved theories. In this work, we develop a simple and unified spectral framework for $\mu$P under joint width-depth scaling. Considering residual networks of varying block depths, we first introduce a spectral $\mu$P condition that precisely characterizes how the norms of weights and their per-step updates should scale with width and depth, unifying previously disparate $\mu$P formulations as special cases. Building on this condition, we then derive a general recipe for implementing $\mu$P across a broad class of optimizers by mapping the spectral constraints to concrete HP parameterizations. This approach not only recovers existing $\mu$P formulations ...

Originally published on March 03, 2026. Curated by AI News.

Llms

[For Hire] Junior AI/ML Engineer | RAG · LLMs · FastAPI · Vector DBs | Remote

Posting this for a friend who isn't on Reddit. A recent graduate, entry level, no commercial production experience but spent the past yea...

Reddit - ML Jobs · 1 min · about 2 hours ago

Llms

I Asked ChatGPT What WIRED’s Reviewers Recommend—Its Answers Were All Wrong | WIRED

Want to know what our reviewers have actually tested and picked as the best TVs, headphones, and laptops? Ask ChatGPT, and it'll give you...

Wired - AI · 8 min · about 2 hours ago

Llms

A Cross-Sectional Study Evaluating the Quality of AI-Generated Patient Education Guides on Diet and Exercise for Diabetes, Hypertension, and Obesity Using ChatGPT-4o, Google Gemini 1.5, Claude Sonnet 4, Perplexity, and Grok

This study evaluates the quality of AI-generated patient education guides on diet and exercise for chronic conditions, comparing five lan...

AI Tools & Products · 2 min · about 5 hours ago

Llms

Agents Can Now Propose and Deploy Their Own Code Changes

150 clones yesterday. 43 stars in 3 days. Every agent framework you've used (LangChain, LangGraph, Claude Code) assumes agents are tools ...

Reddit - Artificial Intelligence · 1 min · about 6 hours ago

[2603.00541] Spectral Condition for $μ$P under Width-Depth Scaling

About this article

Related Articles

[For Hire] Junior AI/ML Engineer | RAG · LLMs · FastAPI · Vector DBs | Remote

I Asked ChatGPT What WIRED’s Reviewers Recommend—Its Answers Were All Wrong | WIRED

A Cross-Sectional Study Evaluating the Quality of AI-Generated Patient Education Guides on Diet and Exercise for Diabetes, Hypertension, and Obesity Using ChatGPT-4o, Google Gemini 1.5, Claude Sonnet 4, Perplexity, and Grok

Agents Can Now Propose and Deploy Their Own Code Changes

No comments

Stay updated with AI News