[2602.21276] Neural network optimization strategies and the topography of the loss landscape
Summary
This paper explores neural network optimization strategies, focusing on the differences between stochastic gradient descent (SGD) and quasi-Newton methods in navigating loss landscapes.
Why It Matters
Understanding the optimization strategies in neural networks is crucial for improving model performance and generalization. This research highlights how different methods affect the loss landscape, which can inform future developments in machine learning algorithms and applications.
Key Takeaways
- SGD explores smoother basins of attraction, leading to solutions with lower barriers.
- Quasi-Newton methods find deeper minima that are less generalizable to unseen data.
- The choice of optimizer significantly impacts the resulting neural network performance.
- Understanding loss landscape topography aids in developing robust models.
- Early stopping regularization affects both SGD and quasi-Newton solutions.
Computer Science > Machine Learning arXiv:2602.21276 (cs) [Submitted on 24 Feb 2026] Title:Neural network optimization strategies and the topography of the loss landscape Authors:Jianneng Yu, Alexandre V. Morozov View a PDF of the paper titled Neural network optimization strategies and the topography of the loss landscape, by Jianneng Yu and Alexandre V. Morozov View PDF HTML (experimental) Abstract:Neural networks are trained by optimizing multi-dimensional sets of fitting parameters on non-convex loss landscapes. Low-loss regions of the landscapes correspond to the parameter sets that perform well on the training data. A key issue in machine learning is the performance of trained neural networks on previously unseen test data. Here, we investigate neural network training by stochastic gradient descent (SGD) - a non-convex global optimization algorithm which relies only on the gradient of the objective function. We contrast SGD solutions with those obtained via a non-stochastic quasi-Newton method, which utilizes curvature information to determine step direction and Golden Section Search to choose step size. We use several computational tools to investigate neural network parameters obtained by these two optimization methods, including kernel Principal Component Analysis and a novel, general-purpose algorithm for finding low-height paths between pairs of points on loss or energy landscapes, FourierPathFinder. We find that the choice of the optimizer profoundly affects the...