[2602.22130] Sample Complexity Bounds for Robust Mean Estimation with Mean-Shift Contamination
Summary
This paper addresses the sample complexity of robust mean estimation under mean-shift contamination, providing new algorithms and bounds for general distributions.
Why It Matters
Understanding sample complexity in mean estimation is crucial for developing robust statistical methods, especially in fields like machine learning and data science where data integrity can be compromised. This research advances the theoretical foundation necessary for creating more resilient algorithms.
Key Takeaways
- The paper resolves an open question regarding sample complexity in mean estimation under mean-shift contamination.
- It introduces a sample-efficient algorithm that achieves desired accuracy under mild spectral conditions.
- The study complements upper bounds with matching lower bounds, enhancing understanding of the problem's complexity.
- Fourier analysis plays a critical role in the proposed methods, introducing the concept of a Fourier witness.
- The findings have implications for robust statistical estimation in various applications.
Computer Science > Machine Learning arXiv:2602.22130 (cs) [Submitted on 25 Feb 2026] Title:Sample Complexity Bounds for Robust Mean Estimation with Mean-Shift Contamination Authors:Ilias Diakonikolas, Giannis Iakovidis, Daniel M. Kane, Sihan Liu View a PDF of the paper titled Sample Complexity Bounds for Robust Mean Estimation with Mean-Shift Contamination, by Ilias Diakonikolas and 3 other authors View PDF HTML (experimental) Abstract:We study the basic task of mean estimation in the presence of mean-shift contamination. In the mean-shift contamination model, an adversary is allowed to replace a small constant fraction of the clean samples by samples drawn from arbitrarily shifted versions of the base distribution. Prior work characterized the sample complexity of this task for the special cases of the Gaussian and Laplace distributions. Specifically, it was shown that consistent estimation is possible in these cases, a property that is provably impossible in Huber's contamination model. An open question posed in earlier work was to determine the sample complexity of mean estimation in the mean-shift contamination model for general base distributions. In this work, we study and essentially resolve this open question. Specifically, we show that, under mild spectral conditions on the characteristic function of the (potentially multivariate) base distribution, there exists a sample-efficient algorithm that estimates the target mean to any desired accuracy. We complement our ...