[2602.19799] Path-conditioned training: a principled way to rescale ReLU neural networks

[2602.19799] Path-conditioned training: a principled way to rescale ReLU neural networks

arXiv - Machine Learning 3 min read Article

Summary

The paper presents a novel approach to rescale ReLU neural networks through path-conditioned training, enhancing training dynamics and efficiency.

Why It Matters

This research addresses a significant gap in the understanding of rescaling symmetries in neural networks, which can lead to improved training speeds and performance. By introducing a geometrically motivated criterion for parameter rescaling, it opens new avenues for optimizing neural network architectures, which is crucial in the rapidly evolving field of machine learning.

Key Takeaways

  • Introduces path-conditioned training for rescaling ReLU networks.
  • Demonstrates how rescaling can significantly impact training dynamics.
  • Proposes an efficient algorithm for parameter alignment.
  • Analyzes the joint impact of architecture and initialization scale.
  • Numerical experiments show potential for faster training.

Statistics > Machine Learning arXiv:2602.19799 (stat) [Submitted on 23 Feb 2026] Title:Path-conditioned training: a principled way to rescale ReLU neural networks Authors:Arthur Lebeurrier, Titouan Vayer, Rémi Gribonval View a PDF of the paper titled Path-conditioned training: a principled way to rescale ReLU neural networks, by Arthur Lebeurrier and 1 other authors View PDF HTML (experimental) Abstract:Despite recent algorithmic advances, we still lack principled ways to leverage the well-documented rescaling symmetries in ReLU neural network parameters. While two properly rescaled weights implement the same function, the training dynamics can be dramatically different. To offer a fresh perspective on exploiting this phenomenon, we build on the recent path-lifting framework, which provides a compact factorization of ReLU networks. We introduce a geometrically motivated criterion to rescale neural network parameters which minimization leads to a conditioning strategy that aligns a kernel in the path-lifting space with a chosen reference. We derive an efficient algorithm to perform this alignment. In the context of random network initialization, we analyze how the architecture and the initialization scale jointly impact the output of the proposed method. Numerical experiments illustrate its potential to speed up training. Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC) Cite as: arXiv:2602.19799 [stat.ML]   (or arXiv:2602.19...

Related Articles

Machine Learning

[D] ICML 2026 Average Score

Hi all, I’m curious about the current review dynamics for ICML 2026, especially after the rebuttal phase. For those who are reviewers (or...

Reddit - Machine Learning · 1 min ·
Machine Learning

[R] VOID: Video Object and Interaction Deletion (physically-consistent video inpainting)

We present VOID, a model for video object removal that aims to handle *physical interactions*, not just appearance. Most existing video i...

Reddit - Machine Learning · 1 min ·
Machine Learning

FLUX 2 Pro (2026) Sketch to Image

I sketched a cow and tested how different models interpret it into a realistic image for downstream 3D generation, turns out some models ...

Reddit - Artificial Intelligence · 1 min ·
Improving AI models’ ability to explain their predictions
Machine Learning

Improving AI models’ ability to explain their predictions

AI News - General · 9 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime