[2602.21846] Scalable Kernel-Based Distances for Statistical Inference and Integration
Summary
This paper explores scalable kernel-based distances for statistical inference, focusing on the maximum mean discrepancy (MMD) and introducing novel kernel quantile discrepancies to enhance computational efficiency.
Why It Matters
Understanding kernel-based distances is crucial for improving statistical inference methods in machine learning. This research addresses computational challenges and proposes new approaches that could enhance the robustness and efficiency of statistical models, making it relevant for practitioners and researchers in the field.
Key Takeaways
- The maximum mean discrepancy (MMD) is a key kernel-based distance used in statistical inference.
- The paper proposes improved estimators for MMD that enhance simulation-based inference.
- Novel kernel quantile discrepancies are introduced as competitive alternatives to MMD.
- The research emphasizes the importance of efficient computation in statistical methods.
- Future work is suggested to explore broader applications of these kernel-based distances.
Statistics > Machine Learning arXiv:2602.21846 (stat) [Submitted on 25 Feb 2026] Title:Scalable Kernel-Based Distances for Statistical Inference and Integration Authors:Masha Naslidnyk View a PDF of the paper titled Scalable Kernel-Based Distances for Statistical Inference and Integration, by Masha Naslidnyk View PDF Abstract:Representing, comparing, and measuring the distance between probability distributions is a key task in computational statistics and machine learning. The choice of representation and the associated distance determine properties of the methods in which they are used: for example, certain distances can allow one to encode robustness or smoothness of the problem. Kernel methods offer flexible and rich Hilbert space representations of distributions that allow the modeller to enforce properties through the choice of kernel, and estimate associated distances at efficient nonparametric rates. In particular, the maximum mean discrepancy (MMD), a kernel-based distance constructed by comparing Hilbert space mean functions, has received significant attention due to its computational tractability and is favoured by practitioners. In this thesis, we conduct a thorough study of kernel-based distances with a focus on efficient computation, with core contributions in Chapters 3 to 6. Part I of the thesis is focused on the MMD, specifically on improved MMD estimation. In Chapter 3 we propose a theoretically sound, improved estimator for MMD in simulation-based inferen...