[2602.01308] Dispelling the Curse of Singularities in Neural Network Optimizations
Summary
This article explores the optimization instability in deep neural networks caused by singularities in the parametric space, proposing a method called Parametric Singularity Smoothing (PSS) to mitigate these issues.
Why It Matters
Understanding and addressing singularities in neural network optimization is crucial for improving model stability and performance. The proposed PSS method offers a new approach to enhance training efficiency and generalization, which can significantly impact machine learning applications.
Key Takeaways
- Singularities in the parametric space can destabilize neural network training.
- The proposed Parametric Singularity Smoothing (PSS) method effectively mitigates instability.
- PSS improves training efficiency and generalization across various datasets and architectures.
- Understanding the growth of singularities is key to enhancing model performance.
- The research provides a novel perspective on neural network optimization challenges.
Computer Science > Machine Learning arXiv:2602.01308 (cs) [Submitted on 1 Feb 2026 (v1), last revised 13 Feb 2026 (this version, v2)] Title:Dispelling the Curse of Singularities in Neural Network Optimizations Authors:Hengjie Cao, Mengyi Chen, Yifeng Yang, Fang Dong, Ruijun Huang, Anrui Chen, Jixian Zhou, Mingzhi Dong, Yujiang Wang, Dongsheng Li, Wenyi Fang, Yuanyi Lin, Fan Wu, Li Shang View a PDF of the paper titled Dispelling the Curse of Singularities in Neural Network Optimizations, by Hengjie Cao and 13 other authors View PDF HTML (experimental) Abstract:This work investigates the optimization instability of deep neural networks from a less-explored yet insightful perspective: the emergence and amplification of singularities in the parametric space. Our analysis reveals that parametric singularities inevitably grow with gradient updates and further intensify alignment with representations, leading to increased singularities in the representation space. We show that the gradient Frobenius norms are bounded by the top singular values of the weight matrices, and as training progresses, the mutually reinforcing growth of weight and representation singularities, termed the curse of singularities, relaxes these bounds, escalating the risk of sharp loss explosions. To counter this, we propose Parametric Singularity Smoothing (PSS), a lightweight, flexible, and effective method for smoothing the singular spectra of weight matrices. Extensive experiments across diverse dataset...