[2511.19476] FAST: Topology-Aware Frequency-Domain Distribution Matching for Coreset Selection
Summary
The paper presents FAST, a novel coreset selection framework that utilizes topology-aware frequency-domain distribution matching, significantly improving efficiency in deep learning training.
Why It Matters
Coreset selection is crucial for optimizing the training of deep neural networks by reducing data size while maintaining performance. This research addresses limitations in existing methods, offering a more effective and energy-efficient solution that could impact various machine learning applications.
Key Takeaways
- FAST introduces a DNN-free framework for coreset selection based on spectral graph theory.
- The method employs Characteristic Function Distance (CFD) to enhance distributional matching.
- It achieves an average accuracy gain of 9.12% over existing methods and reduces power consumption by 96.57%.
- The Progressive Discrepancy-Aware Sampling strategy improves convergence and efficiency.
- FAST demonstrates significant speed improvements, achieving a 2.2x average speedup in training.
Statistics > Machine Learning arXiv:2511.19476 (stat) [Submitted on 22 Nov 2025 (v1), last revised 22 Feb 2026 (this version, v2)] Title:FAST: Topology-Aware Frequency-Domain Distribution Matching for Coreset Selection Authors:Boran Zhao, Jin Cui, Jiajun Xu, Jiaqi Guo, Shuo Guan, Pengju Ren View a PDF of the paper titled FAST: Topology-Aware Frequency-Domain Distribution Matching for Coreset Selection, by Boran Zhao and 5 other authors View PDF HTML (experimental) Abstract:Coreset selection compresses large datasets into compact, representative subsets, reducing the energy and computational burden of training deep neural networks. Existing methods are either: (i) DNN-based, which are tied to model-specific parameters and introduce architectural bias; or (ii) DNN-free, which rely on heuristics lacking theoretical guarantees. Neither approach explicitly constrains distributional equivalence, largely because continuous distribution matching is considered inapplicable to discrete sampling. Moreover, prevalent metrics (e.g., MSE, KL, CE, MMD) cannot accurately capture higher-order moment discrepancies, leading to suboptimal coresets. In this work, we propose FAST, the first DNN-free distribution-matching coreset selection framework that formulates the coreset selection task as a graph-constrained optimization problem grounded in spectral graph theory and employs the Characteristic Function Distance (CFD) to capture full distributional information in the frequency domain. We fur...