[2603.21991] λ-GELU: Learning Gating Hardness for Controlled ReLU-ization in Deep Networks
About this article
Abstract page for arXiv paper 2603.21991: λ-GELU: Learning Gating Hardness for Controlled ReLU-ization in Deep Networks
Computer Science > Machine Learning arXiv:2603.21991 (cs) [Submitted on 23 Mar 2026] Title:λ-GELU: Learning Gating Hardness for Controlled ReLU-ization in Deep Networks Authors:Cristian Pérez-Corral, Alberto Fernández-Hernández, Jose I. Mestre, Manuel F. Dolz, Enrique S. Quintana-Ortí View a PDF of the paper titled {\lambda}-GELU: Learning Gating Hardness for Controlled ReLU-ization in Deep Networks, by Cristian P\'erez-Corral and 4 other authors View PDF HTML (experimental) Abstract:Gaussian Error Linear Unit (GELU) is a widely used smooth alternative to Rectifier Linear Unit (ReLU), yet many deployment, compression, and analysis toolchains are most naturally expressed for piecewise-linear (ReLU-type) networks. We study a hardness-parameterized formulation of GELU, f(x;{\lambda})=x{\Phi}({\lambda} x), where {\Phi} is the Gaussian CDF and {\lambda} \in [1, infty) controls gate sharpness, with the goal of turning smooth gated training into a controlled path toward ReLU-compatible models. Learning {\lambda} is non-trivial: naive updates yield unstable dynamics and effective gradient attenuation, so we introduce a constrained reparameterization and an optimizer-aware update scheme. Empirically, across a diverse set of model--dataset pairs spanning MLPs, CNNs, and Transformers, we observe structured layerwise hardness profiles and assess their robustness under different initializations. We further study a deterministic ReLU-ization strategy in which the learned gates are progr...