[2603.03084] On the Expressive Power of Transformers for Maxout

[2603.03084] On the Expressive Power of Transformers for Maxout Networks and Continuous Piecewise Linear Functions

arXiv - AI March 04, 2026 3 min read

About this article

Abstract page for arXiv paper 2603.03084: On the Expressive Power of Transformers for Maxout Networks and Continuous Piecewise Linear Functions

Computer Science > Machine Learning arXiv:2603.03084 (cs) [Submitted on 3 Mar 2026] Title:On the Expressive Power of Transformers for Maxout Networks and Continuous Piecewise Linear Functions Authors:Linyan Gu, Lihua Yang, Feng Zhou View a PDF of the paper titled On the Expressive Power of Transformers for Maxout Networks and Continuous Piecewise Linear Functions, by Linyan Gu and 2 other authors View PDF HTML (experimental) Abstract:Transformer networks have achieved remarkable empirical success across a wide range of applications, yet their theoretical expressive power remains insufficiently understood. In this paper, we study the expressive capabilities of Transformer architectures. We first establish an explicit approximation of maxout networks by Transformer networks while preserving comparable model complexity. As a consequence, Transformers inherit the universal approximation capability of ReLU networks under similar complexity constraints. Building on this connection, we develop a framework to analyze the approximation of continuous piecewise linear functions by Transformers and quantitatively characterize their expressivity via the number of linear regions, which grows exponentially with depth. Our analysis establishes a theoretical bridge between approximation theory for standard feedforward neural networks and Transformer architectures. It also yields structural insights into Transformers: self-attention layers implement max-type operations, while feedforward la...

Originally published on March 04, 2026. Curated by AI News.

Machine Learning

[R] Fine-tuning services report

If you have some data and want to train or run a small custom model but don't have powerful enough hardware for training, fine-tuning ser...

Reddit - Machine Learning · 1 min · about 2 hours ago

Machine Learning

[D] Does ML have a "bible"/reference textbook at the Intermediate/Advanced level?

Hello, everyone! This is my first time posting here and I apologise if the question is, perhaps, a bit too basic for this sub-reddit. A b...

Reddit - Machine Learning · 1 min · about 3 hours ago

Machine Learning

[D] ICML 2026 review policy debate: 100 responses suggest Policy B may score higher, while Policy A shows higher confidence

A week ago I made a thread asking whether ICML 2026’s review policy might have affected review outcomes, especially whether Policy A pape...

Reddit - Machine Learning · 1 min · about 3 hours ago

Machine Learning

Nomadic raises $8.4 million to wrangle the data pouring off autonomous vehicles | TechCrunch

The company turns footage from robots into structured, searchable datasets with a deep learning model.

TechCrunch - AI · 6 min · about 4 hours ago

[2603.03084] On the Expressive Power of Transformers for Maxout Networks and Continuous Piecewise Linear Functions

About this article

Related Articles

[R] Fine-tuning services report

[D] Does ML have a "bible"/reference textbook at the Intermediate/Advanced level?

[D] ICML 2026 review policy debate: 100 responses suggest Policy B may score higher, while Policy A shows higher confidence

Nomadic raises $8.4 million to wrangle the data pouring off autonomous vehicles | TechCrunch

No comments

Stay updated with AI News