[2505.14338] Better Neural Network Expressivity: Subdividing the Simplex
Summary
This paper investigates the expressivity of ReLU neural networks, demonstrating that fewer hidden layers are needed than previously conjectured to compute all continuous piecewise linear functions.
Why It Matters
Understanding the expressivity of neural networks is crucial for optimizing their architecture and performance. This research challenges existing beliefs about the depth required for certain computations, potentially influencing future designs in machine learning models.
Key Takeaways
- The paper disproves the conjecture that $eiling log_2(n+1) ceil$ hidden layers are optimal for CPWL functions.
- It establishes that $eiling log_3(n-1) ceil + 1$ hidden layers suffice for computing all CPWL functions.
- The findings provide a geometric interpretation of neural network expressivity through polyhedral subdivisions.
- The study highlights the capability of two hidden layers to represent the maximum function of five inputs.
- These insights could lead to more efficient neural network architectures.
Computer Science > Machine Learning arXiv:2505.14338 (cs) [Submitted on 20 May 2025 (v1), last revised 19 Feb 2026 (this version, v3)] Title:Better Neural Network Expressivity: Subdividing the Simplex Authors:Egor Bakaev, Florestan Brunck, Christoph Hertrich, Jack Stade, Amir Yehudayoff View a PDF of the paper titled Better Neural Network Expressivity: Subdividing the Simplex, by Egor Bakaev and 4 other authors View PDF HTML (experimental) Abstract:This work studies the expressivity of ReLU neural networks with a focus on their depth. A sequence of previous works showed that $\lceil \log_2(n+1) \rceil$ hidden layers are sufficient to compute all continuous piecewise linear (CPWL) functions on $\mathbb{R}^n$. Hertrich, Basu, Di Summa, and Skutella (NeurIPS'21 / SIDMA'23) conjectured that this result is optimal in the sense that there are CPWL functions on $\mathbb{R}^n$, like the maximum function, that require this depth. We disprove the conjecture and show that $\lceil\log_3(n-1)\rceil+1$ hidden layers are sufficient to compute all CPWL functions on $\mathbb{R}^n$. A key step in the proof is that ReLU neural networks with two hidden layers can exactly represent the maximum function of five inputs. More generally, we show that $\lceil\log_3(n-2)\rceil+1$ hidden layers are sufficient to compute the maximum of $n\geq 4$ numbers. Our constructions almost match the $\lceil\log_3(n)\rceil$ lower bound of Averkov, Hojny, and Merkert (ICLR'25) in the special case of ReLU networks ...