[2603.24384] On the Use of Bagging for Local Intrinsic Dimensionality Estimation
About this article
Abstract page for arXiv paper 2603.24384: On the Use of Bagging for Local Intrinsic Dimensionality Estimation
Computer Science > Machine Learning arXiv:2603.24384 (cs) [Submitted on 25 Mar 2026] Title:On the Use of Bagging for Local Intrinsic Dimensionality Estimation Authors:Kristóf Péter, Ricardo J. G. B. Campello, James Bailey, Michael E. Houle View a PDF of the paper titled On the Use of Bagging for Local Intrinsic Dimensionality Estimation, by Krist\'of P\'eter and 3 other authors View PDF HTML (experimental) Abstract:The theory of Local Intrinsic Dimensionality (LID) has become a valuable tool for characterizing local complexity within and across data manifolds, supporting a range of data mining and machine learning tasks. Accurate LID estimation requires samples drawn from small neighborhoods around each query to avoid biases from nonlocal effects and potential manifold mixing, yet limited data within such neighborhoods tends to cause high estimation variance. As a variance reduction strategy, we propose an ensemble approach that uses subbagging to preserve the local distribution of nearest neighbor (NN) distances. The main challenge is that the uniform reduction in total sample size within each subsample increases the proximity threshold for finding a fixed number k of NNs around the query. As a result, in the specific context of LID estimation, the sampling rate has an additional, complex interplay with the neighborhood size, where both combined determine the sample size as well as the locality and resolution considered for estimation. We analyze both theoretically and ex...