[2602.19691] Smoothness Adaptivity in Constant-Depth Neural Networks: Optimal Rates via Smooth Activations
Summary
This paper explores the advantages of smooth activation functions in constant-depth neural networks, demonstrating their ability to achieve optimal approximation and estimation error rates compared to non-smooth activations.
Why It Matters
Understanding the role of activation smoothness in neural networks is crucial for improving model performance and efficiency. This research provides insights that could influence future designs of neural architectures, particularly in achieving statistical optimality without increasing network depth.
Key Takeaways
- Smooth activation functions allow constant-depth networks to exploit high orders of target function smoothness.
- These networks achieve minimax-optimal approximation and estimation error rates.
- Non-smooth activations, like ReLU, require increased depth to capture higher-order smoothness.
- Activation smoothness is identified as a key mechanism for statistical optimality in neural networks.
- The study introduces a constructive approximation framework for better model complexity control.
Statistics > Machine Learning arXiv:2602.19691 (stat) [Submitted on 23 Feb 2026] Title:Smoothness Adaptivity in Constant-Depth Neural Networks: Optimal Rates via Smooth Activations Authors:Yuhao Liu, Zilin Wang, Lei Wu, Shaobo Zhang View a PDF of the paper titled Smoothness Adaptivity in Constant-Depth Neural Networks: Optimal Rates via Smooth Activations, by Yuhao Liu and 3 other authors View PDF Abstract:Smooth activation functions are ubiquitous in modern deep learning, yet their theoretical advantages over non-smooth counterparts remain poorly understood. In this work, we characterize both approximation and statistical properties of neural networks with smooth activations over the Sobolev space $W^{s,\infty}([0,1]^d)$ for arbitrary smoothness $s>0$. We prove that constant-depth networks equipped with smooth activations automatically exploit arbitrarily high orders of target function smoothness, achieving the minimax-optimal approximation and estimation error rates (up to logarithmic factors). In sharp contrast, networks with non-smooth activations, such as ReLU, lack this adaptivity: their attainable approximation order is strictly limited by depth, and capturing higher-order smoothness requires proportional depth growth. These results identify activation smoothness as a fundamental mechanism, alternative to depth, for attaining statistical optimality. Technically, our results are established via a constructive approximation framework that produces explicit neural netw...