[2602.12390] Rational Neural Networks have Expressivity Advantages

[2602.12390] Rational Neural Networks have Expressivity Advantages

arXiv - Machine Learning 3 min read Article

Summary

The paper explores the advantages of Rational Neural Networks, demonstrating their superior expressivity and parameter efficiency compared to traditional activation functions in neural networks.

Why It Matters

This research highlights a significant advancement in neural network design, showing that rational activation functions can enhance model performance while reducing complexity. This could lead to more efficient AI systems and better resource utilization in machine learning applications.

Key Takeaways

  • Rational Neural Networks outperform traditional activation functions in expressivity.
  • They require fewer parameters to achieve similar or better performance.
  • The findings suggest a paradigm shift in neural network architecture design.

Computer Science > Machine Learning arXiv:2602.12390 (cs) [Submitted on 12 Feb 2026] Title:Rational Neural Networks have Expressivity Advantages Authors:Maosen Tang, Alex Townsend View a PDF of the paper titled Rational Neural Networks have Expressivity Advantages, by Maosen Tang and Alex Townsend View PDF HTML (experimental) Abstract:We study neural networks with trainable low-degree rational activation functions and show that they are more expressive and parameter-efficient than modern piecewise-linear and smooth activations such as ELU, LeakyReLU, LogSigmoid, PReLU, ReLU, SELU, CELU, Sigmoid, SiLU, Mish, Softplus, Tanh, Softmin, Softmax, and LogSoftmax. For an error target of $\varepsilon>0$, we establish approximation-theoretic separations: Any network built from standard fixed activations can be uniformly approximated on compact domains by a rational-activation network with only $\mathrm{poly}(\log\log(1/\varepsilon))$ overhead in size, while the converse provably requires $\Omega(\log(1/\varepsilon))$ parameters in the worst case. This exponential gap persists at the level of full networks and extends to gated activations and transformer-style nonlinearities. In practice, rational activations integrate seamlessly into standard architectures and training pipelines, allowing rationals to match or outperform fixed activations under identical architectures and optimizers. Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Numerical Analysis (math.NA) MS...

Related Articles

Machine Learning

Educational PyTorch repo for distributed training from scratch: DP, FSDP, TP, FSDP+TP, and PP

I put together a small educational repo that implements distributed training parallelism from scratch in PyTorch: https://github.com/shre...

Reddit - Artificial Intelligence · 1 min ·
Llms

Claude cannot be trusted to perform complex engineering tasks

AMD’s AI director just analyzed 6,852 Claude Code sessions, 234,760 tool calls, and 17,871 thinking blocks. Her conclusion: “Claude canno...

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

Training an AI to play Resident Evil Requiem using Behavior Cloning + HG-DAgge [P]

Code of Project: https://github.com/paulo101977/notebooks-rl/tree/main/re_requiem I’ve been working on training an agent to play a segmen...

Reddit - Machine Learning · 1 min ·
Machine Learning

Educational PyTorch repo for distributed training from scratch: DP, FSDP, TP, FSDP+TP, and PP [P]

I put together a small educational repo that implements distributed training parallelism from scratch in PyTorch: https://github.com/shre...

Reddit - Machine Learning · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime