[2411.02225] Sparse Max-Affine Regression
About this article
Abstract page for arXiv paper 2411.02225: Sparse Max-Affine Regression
Statistics > Machine Learning arXiv:2411.02225 (stat) [Submitted on 4 Nov 2024 (v1), last revised 4 Apr 2026 (this version, v3)] Title:Sparse Max-Affine Regression Authors:Haitham Kanj, Seonho Kim, Kiryung Lee View a PDF of the paper titled Sparse Max-Affine Regression, by Haitham Kanj and 2 other authors View PDF HTML (experimental) Abstract:This paper presents Sparse Gradient Descent as a solution for variable selection in convex piecewise linear regression, where the model is given as the maximum of $k$-affine functions $ x \mapsto \max_{j \in [k]} \langle a_j^\star, x \rangle + b_j^\star$ for $j = 1,\dots,k$. Here, $\{ a_j^\star\}_{j=1}^k$ and $\{b_j^\star\}_{j=1}^k$ denote the ground-truth weight vectors and intercepts. A non-asymptotic local convergence analysis is provided for Sp-GD under sub-Gaussian noise when the covariate distribution satisfies the sub-Gaussianity and anti-concentration properties. When the model order and parameters are fixed, Sp-GD provides an $\epsilon$-accurate estimate given $\mathcal{O}(\max(\epsilon^{-2}\sigma_z^2,1)s\log(d/s))$ observations where $\sigma_z^2$ denotes the noise variance. This also implies the exact parameter recovery by Sp-GD from $\mathcal{O}(s\log(d/s))$ noise-free observations. The proposed initialization scheme uses sparse principal component analysis to estimate the subspace spanned by $\{ a_j^\star\}_{j=1}^k$, then applies an $r$-covering search to estimate the model parameters. A non-asymptotic analysis is presente...