Machine Learning Ai Infrastructure Data Science

[2602.19017] Why ReLU? A Bit-Model Dichotomy for Deep Network Training

arXiv - Machine Learning February 24, 2026 4 min read Article

Summary

This paper investigates the complexity of training deep neural networks under a realistic bit-level model, contrasting it with traditional models and highlighting the advantages of ReLU activation functions.

Why It Matters

Understanding the computational complexity of deep learning models is crucial for developing efficient training algorithms. This research provides insights into how activation functions like ReLU can simplify training processes, making them more feasible in practical applications.

Key Takeaways

Training deep networks with polynomial activations is $ ext{#P}$-Hard, complicating the learning process.
ReLU and piecewise-linear activations allow for manageable precision requirements, keeping training within NP-complete.
The study highlights the impact of finite-precision constraints on the learnability of neural networks.
Exploding and vanishing gradients are linked to the complexity of activation functions.
Standard backpropagation remains efficient with ReLU, operating in polynomial time.

Computer Science > Machine Learning arXiv:2602.19017 (cs) [Submitted on 22 Feb 2026] Title:Why ReLU? A Bit-Model Dichotomy for Deep Network Training Authors:Ilan Doron-Arad, Elchanan Mossel View a PDF of the paper titled Why ReLU? A Bit-Model Dichotomy for Deep Network Training, by Ilan Doron-Arad and Elchanan Mossel View PDF HTML (experimental) Abstract:Theoretical analyses of Empirical Risk Minimization (ERM) are standardly framed within the Real-RAM model of computation. In this setting, training even simple neural networks is known to be $\exists \mathbb{R}$-complete -- a complexity class believed to be harder than NP, that characterizes the difficulty of solving systems of polynomial inequalities over the real numbers. However, this algebraic framework diverges from the reality of digital computation with finite-precision hardware. In this work, we analyze the theoretical complexity of ERM under a realistic bit-level model ($\mathsf{ERM}_{\text{bit}}$), where network parameters and inputs are constrained to be rational numbers with polynomially bounded bit-lengths. Under this model, we reveal a sharp dichotomy in tractability governed by the network's activation function. We prove that for deep networks with {\em any} polynomial activations with rational coefficients and degree at least $2$, the bit-complexity of training is severe: deciding $\mathsf{ERM}_{\text{bit}}$ is $\#P$-Hard, hence believed to be strictly harder than NP-complete problems. Furthermore, we show ...

Read Original Article

[2602.19017] Why ReLU? A Bit-Model Dichotomy for Deep Network Training

Summary

Why It Matters

Key Takeaways

Related Articles

UMKC Announces New Master of Science in Artificial Intelligence

Improving AI models’ ability to explain their predictions

[P] SpeakFlow - AI Dialogue Practice Coach with GLM 5.1

[R] ICML Anonymized git repos for rebuttal

No comments

Stay updated with AI News