[2602.19799] Path-conditioned training: a principled way to rescale ReLU neural networks
Summary
The paper presents a novel approach to rescale ReLU neural networks through path-conditioned training, enhancing training dynamics and efficiency.
Why It Matters
This research addresses a significant gap in the understanding of rescaling symmetries in neural networks, which can lead to improved training speeds and performance. By introducing a geometrically motivated criterion for parameter rescaling, it opens new avenues for optimizing neural network architectures, which is crucial in the rapidly evolving field of machine learning.
Key Takeaways
- Introduces path-conditioned training for rescaling ReLU networks.
- Demonstrates how rescaling can significantly impact training dynamics.
- Proposes an efficient algorithm for parameter alignment.
- Analyzes the joint impact of architecture and initialization scale.
- Numerical experiments show potential for faster training.
Statistics > Machine Learning arXiv:2602.19799 (stat) [Submitted on 23 Feb 2026] Title:Path-conditioned training: a principled way to rescale ReLU neural networks Authors:Arthur Lebeurrier, Titouan Vayer, Rémi Gribonval View a PDF of the paper titled Path-conditioned training: a principled way to rescale ReLU neural networks, by Arthur Lebeurrier and 1 other authors View PDF HTML (experimental) Abstract:Despite recent algorithmic advances, we still lack principled ways to leverage the well-documented rescaling symmetries in ReLU neural network parameters. While two properly rescaled weights implement the same function, the training dynamics can be dramatically different. To offer a fresh perspective on exploiting this phenomenon, we build on the recent path-lifting framework, which provides a compact factorization of ReLU networks. We introduce a geometrically motivated criterion to rescale neural network parameters which minimization leads to a conditioning strategy that aligns a kernel in the path-lifting space with a chosen reference. We derive an efficient algorithm to perform this alignment. In the context of random network initialization, we analyze how the architecture and the initialization scale jointly impact the output of the proposed method. Numerical experiments illustrate its potential to speed up training. Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC) Cite as: arXiv:2602.19799 [stat.ML] (or arXiv:2602.19...