[2602.22985] Kernel Integrated $R^2$: A Measure of Dependence
Summary
The paper introduces Kernel Integrated $R^2$, a novel statistical measure of dependence that enhances the integrated $R^2$ by utilizing reproducing kernel Hilbert spaces, allowing for analysis of complex data structures.
Why It Matters
This research is significant as it addresses the limitations of traditional dependence measures by providing a flexible and robust framework for analyzing multivariate and structured data. The proposed measure is particularly relevant in fields where understanding complex relationships is crucial, such as machine learning and statistics.
Key Takeaways
- Kernel Integrated $R^2$ extends the integrated $R^2$ to more complex data types.
- The measure is sensitive to tail behavior and oscillatory dependence structures.
- Two estimators are proposed, demonstrating competitive performance in dependency testing.
- The method adapts to intrinsic dimensionality, enhancing its applicability.
- Numerical experiments validate its effectiveness against existing measures.
Statistics > Machine Learning arXiv:2602.22985 (stat) [Submitted on 26 Feb 2026] Title:Kernel Integrated $R^2$: A Measure of Dependence Authors:Pouya Roudaki, Shakeel Gavioli-Akilagun, Florian Kalinke, Mona Azadkia, Zoltán Szabó View a PDF of the paper titled Kernel Integrated $R^2$: A Measure of Dependence, by Pouya Roudaki and 4 other authors View PDF HTML (experimental) Abstract:We introduce kernel integrated $R^2$, a new measure of statistical dependence that combines the local normalization principle of the recently introduced integrated $R^2$ with the flexibility of reproducing kernel Hilbert spaces (RKHSs). The proposed measure extends integrated $R^2$ from scalar responses to responses taking values on general spaces equipped with a characteristic kernel, allowing to measure dependence of multivariate, functional, and structured data, while remaining sensitive to tail behaviour and oscillatory dependence structures. We establish that (i) this new measure takes values in $[0,1]$, (ii) equals zero if and only if independence holds, and (iii) equals one if and only if the response is almost surely a measurable function of the covariates. Two estimators are proposed: a graph-based method using $K$-nearest neighbours and an RKHS-based method built on conditional mean embeddings. We prove consistency and derive convergence rates for the graph-based estimator, showing its adaptation to intrinsic dimensionality. Numerical experiments on simulated data and a real data exper...