[2603.03672] Local Shapley: Model-Induced Locality and Optimal Reuse in Data Valuation
About this article
Abstract page for arXiv paper 2603.03672: Local Shapley: Model-Induced Locality and Optimal Reuse in Data Valuation
Computer Science > Machine Learning arXiv:2603.03672 (cs) [Submitted on 4 Mar 2026] Title:Local Shapley: Model-Induced Locality and Optimal Reuse in Data Valuation Authors:Xuan Yang, Hsi-Wen Chen, Ming-Syan Chen, Jian Pei View a PDF of the paper titled Local Shapley: Model-Induced Locality and Optimal Reuse in Data Valuation, by Xuan Yang and 3 other authors View PDF HTML (experimental) Abstract:The Shapley value provides a principled foundation for data valuation, but exact computation is #P-hard due to the exponential coalition space. Existing accelerations remain global and ignore a structural property of modern predictors: for a given test instance, only a small subset of training points influences the prediction. We formalize this model-induced locality through support sets defined by the model's computational pathway (e.g., neighbors in KNN, leaves in trees, receptive fields in GNNs), showing that Shapley computation can be projected onto these supports without loss when locality is exact. This reframes Shapley evaluation as a structured data processing problem over overlapping support-induced subset families rather than exhaustive coalition enumeration. We prove that the intrinsic complexity of Local Shapley is governed by the number of distinct influential subsets, establishing an information-theoretic lower bound on retraining operations. Guided by this result, we propose LSMR (Local Shapley via Model Reuse), an optimal subset-centric algorithm that trains each in...