[2602.14607] A Bayesian Approach to Low-Discrepancy Subset Selection
Summary
This paper presents a Bayesian approach to low-discrepancy subset selection, addressing its NP-hardness and proposing a Bayesian Optimization method to improve selection efficiency.
Why It Matters
Understanding low-discrepancy subset selection is crucial for various applications in machine learning, robotics, and numerical analysis. This research provides a new Bayesian framework that could enhance the efficiency of selecting optimal subsets, which is significant given the NP-hard nature of the problem.
Key Takeaways
- Low-discrepancy designs are vital in quasi-Monte Carlo methods and related fields.
- The subset selection problem is established as NP-hard, complicating optimal selection.
- A Bayesian Optimization procedure is proposed to effectively minimize discrepancy measures.
- The framework is applicable across various design criteria, enhancing its utility.
- This research could influence future methodologies in machine learning and robotics.
Statistics > Methodology arXiv:2602.14607 (stat) [Submitted on 16 Feb 2026] Title:A Bayesian Approach to Low-Discrepancy Subset Selection Authors:Nathan Kirk View a PDF of the paper titled A Bayesian Approach to Low-Discrepancy Subset Selection, by Nathan Kirk View PDF HTML (experimental) Abstract:Low-discrepancy designs play a central role in quasi-Monte Carlo methods and are increasingly influential in other domains such as machine learning, robotics and computer graphics, to name a few. In recent years, one such low-discrepancy construction method called subset selection has received a lot of attention. Given a large population, one optimally selects a small low-discrepancy subset with respect to a discrepancy-based objective. Versions of this problem are known to be NP-hard. In this text, we establish, for the first time, that the subset selection problem with respect to kernel discrepancies is also NP-hard. Motivated by this intractability, we propose a Bayesian Optimization procedure for the subset selection problem utilizing the recent notion of deep embedding kernels. We demonstrate the performance of the BO algorithm to minimize discrepancy measures and note that the framework is broadly applicable any design criteria. Comments: Subjects: Methodology (stat.ME); Machine Learning (cs.LG); Numerical Analysis (math.NA); Computation (stat.CO) MSC classes: 65C05, 60G15 Cite as: arXiv:2602.14607 [stat.ME] (or arXiv:2602.14607v1 [stat.ME] for this version) https://doi...