[2510.08570] Who Said Neural Networks Aren't Linear?
Summary
This paper explores the linearity of neural networks by introducing a framework that identifies non-standard vector spaces where neural networks can act as linear operators, termed 'Linearizers'.
Why It Matters
Understanding the linear properties of neural networks can enhance their interpretability and applicability in various machine learning tasks. This research could lead to more efficient algorithms and improved performance in generative models and diffusion processes.
Key Takeaways
- Introduces the concept of Linearizers, allowing neural networks to act as linear operators in specific vector spaces.
- Demonstrates that the composition of Linearizers sharing a neural network retains linearity.
- Applies the framework to improve diffusion model training, collapsing multiple sampling steps into one.
- Enforces idempotency in networks, leading to globally projective generative models.
- Facilitates modular style transfer using the proposed architecture.
Computer Science > Machine Learning arXiv:2510.08570 (cs) [Submitted on 9 Oct 2025 (v1), last revised 20 Feb 2026 (this version, v2)] Title:Who Said Neural Networks Aren't Linear? Authors:Nimrod Berman, Assaf Hallak, Assaf Shocher View a PDF of the paper titled Who Said Neural Networks Aren't Linear?, by Nimrod Berman and 2 other authors View PDF HTML (experimental) Abstract:Neural networks are famously nonlinear. However, linearity is defined relative to a pair of vector spaces, $f:X \to Y$. Leveraging the algebraic concept of transport of structure, we propose a method to explicitly identify non-standard vector spaces where a neural network acts as a linear operator. When sandwiching a linear operator $A$ between two invertible neural networks, $f(x)=g_y^{-1}(A g_x(x))$, the corresponding vector spaces $X$ and $Y$ are induced by newly defined addition and scaling actions derived from $g_x$ and $g_y$. We term this kind of architecture a Linearizer. This framework makes the entire arsenal of linear algebra, including SVD, pseudo-inverse, orthogonal projection and more, applicable to nonlinear mappings. Furthermore, we show that the composition of two Linearizers that share a neural network is also a Linearizer. We leverage this property and demonstrate that training diffusion models using our architecture makes the hundreds of sampling steps collapse into a single step. We further utilize our framework to enforce idempotency (i.e. $f(f(x))=f(x)$) on networks leading to a g...