[2505.11846] Learning on a Razor's Edge: Identifiability and Singularity of Polynomial Neural Networks
Summary
This paper investigates the identifiability and singularity of polynomial neural networks, focusing on MLPs and CNNs, and explores their geometric properties using algebraic geometry.
Why It Matters
Understanding the identifiability of neural networks is crucial for improving model interpretability and performance. This research sheds light on the structure of neural networks, potentially guiding future developments in machine learning and AI.
Key Takeaways
- Identifiability issues in MLPs lead to finitely many parameter choices for most functions.
- CNNs demonstrate a generic one-to-one parameterization.
- Singular points in neuromanifolds arise from sparse subnetworks, impacting model performance.
- The geometric explanation of MLPs' sparsity bias is linked to critical points of mean-squared error loss.
- Algebraic geometry tools are essential for analyzing these neural network structures.
Computer Science > Machine Learning arXiv:2505.11846 (cs) [Submitted on 17 May 2025 (v1), last revised 13 Feb 2026 (this version, v2)] Title:Learning on a Razor's Edge: Identifiability and Singularity of Polynomial Neural Networks Authors:Vahid Shahverdi, Giovanni Luca Marchetti, Kathlén Kohn View a PDF of the paper titled Learning on a Razor's Edge: Identifiability and Singularity of Polynomial Neural Networks, by Vahid Shahverdi and 2 other authors View PDF HTML (experimental) Abstract:We study function spaces parametrized by neural networks, referred to as neuromanifolds. Specifically, we focus on deep Multi-Layer Perceptrons (MLPs) and Convolutional Neural Networks (CNNs) with an activation function that is a sufficiently generic polynomial. First, we address the identifiability problem, showing that, for almost all functions in the neuromanifold of an MLP, there exist only finitely many parameter choices yielding that function. For CNNs, the parametrization is generically one-to-one. As a consequence, we compute the dimension of the neuromanifold. Second, we describe singular points of neuromanifolds. We characterize singularities completely for CNNs, and partially for MLPs. In both cases, they arise from sparse subnetworks. For MLPs, we prove that these singularities often correspond to critical points of the mean-squared error loss, which does not hold for CNNs. This provides a geometric explanation of the sparsity bias of MLPs. All of our results leverage tools fro...