[2602.16642] Optimizer choice matters for the emergence of Neural Collapse

[2602.16642] Optimizer choice matters for the emergence of Neural Collapse

arXiv - Machine Learning 4 min read Article

Summary

This paper explores how the choice of optimizer affects the emergence of Neural Collapse (NC) in deep neural networks, introducing a new metric for analysis and providing theoretical and empirical insights.

Why It Matters

Understanding the impact of optimizer choice on Neural Collapse is crucial for improving deep learning models. This research challenges the assumption that NC is universal across optimizers, highlighting the importance of optimizer characteristics in model performance and training dynamics.

Key Takeaways

  • Optimizer choice significantly influences the emergence of Neural Collapse in neural networks.
  • The new metric NC0 provides a necessary condition for analyzing Neural Collapse.
  • Different optimizers exhibit qualitatively different dynamics regarding Neural Collapse, particularly with weight decay coupling.

Computer Science > Machine Learning arXiv:2602.16642 (cs) [Submitted on 18 Feb 2026] Title:Optimizer choice matters for the emergence of Neural Collapse Authors:Jim Zhao, Tin Sum Cheng, Wojciech Masarczyk, Aurelien Lucchi View a PDF of the paper titled Optimizer choice matters for the emergence of Neural Collapse, by Jim Zhao and 3 other authors View PDF Abstract:Neural Collapse (NC) refers to the emergence of highly symmetric geometric structures in the representations of deep neural networks during the terminal phase of training. Despite its prevalence, the theoretical understanding of NC remains limited. Existing analyses largely ignore the role of the optimizer, thereby suggesting that NC is universal across optimization methods. In this work, we challenge this assumption and demonstrate that the choice of optimizer plays a critical role in the emergence of NC. The phenomenon is typically quantified through NC metrics, which, however, are difficult to track and analyze theoretically. To overcome this limitation, we introduce a novel diagnostic metric, NC0, whose convergence to zero is a necessary condition for NC. Using NC0, we provide theoretical evidence that NC cannot emerge under decoupled weight decay in adaptive optimizers, as implemented in AdamW. Concretely, we prove that SGD, SignGD with coupled weight decay (a special case of Adam), and SignGD with decoupled weight decay (a special case of AdamW) exhibit qualitatively different NC0 dynamics. Also, we show the...

Related Articles

Meta AI app climbs to No. 5 on the App Store after Muse Spark launch | TechCrunch
Machine Learning

Meta AI app climbs to No. 5 on the App Store after Muse Spark launch | TechCrunch

The app was ranking No. 57 on the App Store just before Meta AI's new model launched. Now it's No. 5 — and rising.

TechCrunch - AI · 4 min ·
Machine Learning

Detecting mirrored selfie images: OCR the best way? [D]

I'm trying to catch backwards "selfie" images before passing them to our VLM text reader and/or face embedding extraction. Since models l...

Reddit - Machine Learning · 1 min ·
Llms

Google’s Gemini AI can answer your questions with 3D models and simulations

submitted by /u/tekz [link] [comments]

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

Cold start latency on GPU cloud platforms in 2026 — p99 specifically, not p50. Anyone have real data? [D]

doing infrastructure evaluation for inference workloads and running into the same problem everywhere: every platform publishes p50 cold s...

Reddit - Machine Learning · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime