[2602.23023] Low-degree Lower bounds for clustering in moderate dimension
Summary
This paper explores the clustering of points from a mixture of isotropic Gaussians in moderate dimensions, establishing new polynomial lower bounds and presenting a novel algorithm for better performance in this regime.
Why It Matters
Understanding clustering in moderate dimensions is crucial for improving machine learning algorithms, especially as data complexity increases. This research addresses a significant gap in existing literature, providing insights that could enhance clustering methodologies and their applications in various fields.
Key Takeaways
- Investigates clustering of points in moderate dimensions, a less explored area.
- Establishes new low-degree polynomial lower bounds for clustering performance.
- Introduces a non-spectral algorithm that matches the non-parametric rate of clustering difficulty.
- Highlights the differences in clustering challenges between high-dimensional and moderate-dimensional regimes.
- Addresses the computational limits of clustering problems, providing a foundation for future research.
Mathematics > Statistics Theory arXiv:2602.23023 (math) [Submitted on 26 Feb 2026] Title:Low-degree Lower bounds for clustering in moderate dimension Authors:Alexandra Carpentier, Nicolas Verzelen View a PDF of the paper titled Low-degree Lower bounds for clustering in moderate dimension, by Alexandra Carpentier and Nicolas Verzelen View PDF Abstract:We study the fundamental problem of clustering $n$ points into $K$ groups drawn from a mixture of isotropic Gaussians in $\mathbb{R}^d$. Specifically, we investigate the requisite minimal distance $\Delta$ between mean vectors to partially recover the underlying partition. While the minimax-optimal threshold for $\Delta$ is well-established, a significant gap exists between this information-theoretic limit and the performance of known polynomial-time procedures. Although this gap was recently characterized in the high-dimensional regime ($n \leq dK$), it remains largely unexplored in the moderate-dimensional regime ($n \geq dK$). In this manuscript, we address this regime by establishing a new low-degree polynomial lower bound for the moderate-dimensional case when $d \geq K$. We show that while the difficulty of clustering for $n \leq dK$ is primarily driven by dimension reduction and spectral methods, the moderate-dimensional regime involves more delicate phenomena leading to a "non-parametric rate". We provide a novel non-spectral algorithm matching this rate, shedding new light on the computational limits of the clustering...