[2602.16673] Neighborhood Stability as a Measure of Nearest Neighbor Searchability
Summary
The paper introduces two measures for assessing the searchability of datasets in clustering-based Approximate Nearest Neighbor Search (ANNS), enhancing the understanding of clustering quality and dataset suitability.
Why It Matters
This research addresses a significant gap in the analysis of clustering-based ANNS, providing tools that can predict the accuracy of nearest neighbor searches. By focusing on neighborhood stability, it offers a new perspective on dataset evaluation, which is crucial for improving machine learning applications in high-dimensional spaces.
Key Takeaways
- Introduces Clustering-Neighborhood Stability Measure (clustering-NSM) for evaluating clustering quality.
- Presents Point-Neighborhood Stability Measure (point-NSM) to assess dataset clusterability.
- Both measures rely on nearest neighbor relationships, making them versatile across various distance functions.
- Provides a framework for determining dataset suitability for clustering-based ANNS.
- Enhances the analytical tools available for machine learning practitioners.
Computer Science > Machine Learning arXiv:2602.16673 (cs) [Submitted on 18 Feb 2026] Title:Neighborhood Stability as a Measure of Nearest Neighbor Searchability Authors:Thomas Vecchiato, Sebastian Bruch View a PDF of the paper titled Neighborhood Stability as a Measure of Nearest Neighbor Searchability, by Thomas Vecchiato and Sebastian Bruch View PDF HTML (experimental) Abstract:Clustering-based Approximate Nearest Neighbor Search (ANNS) organizes a set of points into partitions, and searches only a few of them to find the nearest neighbors of a query. Despite its popularity, there are virtually no analytical tools to determine the suitability of clustering-based ANNS for a given dataset -- what we call "searchability." To address that gap, we present two measures for flat clusterings of high-dimensional points in Euclidean space. First is Clustering-Neighborhood Stability Measure (clustering-NSM), an internal measure of clustering quality -- a function of a clustering of a dataset -- that we show to be predictive of ANNS accuracy. The second, Point-Neighborhood Stability Measure (point-NSM), is a measure of clusterability -- a function of the dataset itself -- that is predictive of clustering-NSM. The two together allow us to determine whether a dataset is searchable by clustering-based ANNS given only the data points. Importantly, both are functions of nearest neighbor relationships between points, not distances, making them applicable to various distance functions incl...