[2601.16427] Perfect Clustering for Sparse Directed Stochastic Block Models
Summary
The paper presents a novel non-spectral method for community detection in sparse directed stochastic block models, addressing limitations of existing spectral approaches.
Why It Matters
This research is significant as it provides a robust solution for community detection in directed networks, which is crucial for understanding complex systems in various fields such as social networks, biology, and information science. The method's ability to ensure exact recovery of community labels under challenging conditions enhances its applicability in real-world scenarios.
Key Takeaways
- Introduces a two-stage procedure for community detection in sparse directed stochastic block models.
- First method to guarantee exact recovery of community labels in sparse, directed settings.
- Utilizes neighborhood-smoothing techniques to overcome limitations of spectral methods.
- Demonstrates reliable performance through simulation studies in asymmetric, low-degree regimes.
- Provides theoretical insights with a uniform row-wise concentration bound for the estimator.
Statistics > Machine Learning arXiv:2601.16427 (stat) [Submitted on 23 Jan 2026 (v1), last revised 16 Feb 2026 (this version, v2)] Title:Perfect Clustering for Sparse Directed Stochastic Block Models Authors:Behzad Aalipur, Yichen Qin View a PDF of the paper titled Perfect Clustering for Sparse Directed Stochastic Block Models, by Behzad Aalipur and Yichen Qin View PDF HTML (experimental) Abstract:Exact recovery in stochastic block models (SBMs) is well understood in undirected settings, but remains considerably less developed for directed and sparse networks, particularly when the number of communities diverges. Spectral methods for directed SBMs often lack stability in asymmetric, low-degree regimes, and existing non-spectral approaches focus primarily on undirected or dense settings. We propose a fully non-spectral, two-stage procedure for community detection in sparse directed SBMs with potentially growing numbers of communities. The method first estimates the directed probability matrix using a neighborhood-smoothing scheme tailored to the asymmetric setting, and then applies $K$-means clustering to the estimated rows, thereby avoiding the limitations of eigen- or singular value decompositions in sparse, asymmetric networks. Our main theoretical contribution is a uniform row-wise concentration bound for the smoothed estimator, obtained through new arguments that control asymmetric neighborhoods and separate in- and out-degree effects. These results imply the exact rec...