[2602.19017] Why ReLU? A Bit-Model Dichotomy for Deep Network Training

[2602.19017] Why ReLU? A Bit-Model Dichotomy for Deep Network Training

arXiv - Machine Learning 4 min read Article

Summary

This paper investigates the complexity of training deep neural networks under a realistic bit-level model, contrasting it with traditional models and highlighting the advantages of ReLU activation functions.

Why It Matters

Understanding the computational complexity of deep learning models is crucial for developing efficient training algorithms. This research provides insights into how activation functions like ReLU can simplify training processes, making them more feasible in practical applications.

Key Takeaways

  • Training deep networks with polynomial activations is $ ext{#P}$-Hard, complicating the learning process.
  • ReLU and piecewise-linear activations allow for manageable precision requirements, keeping training within NP-complete.
  • The study highlights the impact of finite-precision constraints on the learnability of neural networks.
  • Exploding and vanishing gradients are linked to the complexity of activation functions.
  • Standard backpropagation remains efficient with ReLU, operating in polynomial time.

Computer Science > Machine Learning arXiv:2602.19017 (cs) [Submitted on 22 Feb 2026] Title:Why ReLU? A Bit-Model Dichotomy for Deep Network Training Authors:Ilan Doron-Arad, Elchanan Mossel View a PDF of the paper titled Why ReLU? A Bit-Model Dichotomy for Deep Network Training, by Ilan Doron-Arad and Elchanan Mossel View PDF HTML (experimental) Abstract:Theoretical analyses of Empirical Risk Minimization (ERM) are standardly framed within the Real-RAM model of computation. In this setting, training even simple neural networks is known to be $\exists \mathbb{R}$-complete -- a complexity class believed to be harder than NP, that characterizes the difficulty of solving systems of polynomial inequalities over the real numbers. However, this algebraic framework diverges from the reality of digital computation with finite-precision hardware. In this work, we analyze the theoretical complexity of ERM under a realistic bit-level model ($\mathsf{ERM}_{\text{bit}}$), where network parameters and inputs are constrained to be rational numbers with polynomially bounded bit-lengths. Under this model, we reveal a sharp dichotomy in tractability governed by the network's activation function. We prove that for deep networks with {\em any} polynomial activations with rational coefficients and degree at least $2$, the bit-complexity of training is severe: deciding $\mathsf{ERM}_{\text{bit}}$ is $\#P$-Hard, hence believed to be strictly harder than NP-complete problems. Furthermore, we show ...

Related Articles

UMKC Announces New Master of Science in Artificial Intelligence
Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min ·
Improving AI models’ ability to explain their predictions
Machine Learning

Improving AI models’ ability to explain their predictions

AI News - General · 9 min ·
Machine Learning

[P] SpeakFlow - AI Dialogue Practice Coach with GLM 5.1

Built SpeakFlow for the Z.AI Builder Series hackathon. AI dialogue practice coach that evaluates your spoken responses in real-time. Two ...

Reddit - Machine Learning · 1 min ·
Machine Learning

[R] ICML Anonymized git repos for rebuttal

A number of the papers I'm reviewing for have submitted additional figures and code through anonymized git repos (e.g. https://anonymous....

Reddit - Machine Learning · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime