[2602.15277] Accelerating Large-Scale Dataset Distillation via Exploration-Exploitation Optimization
Summary
This paper presents Exploration-Exploitation Distillation (E^2D), a method for efficient large-scale dataset distillation that balances accuracy and computational efficiency, achieving significant performance improvements on benchmark datasets.
Why It Matters
As machine learning models grow in complexity, the need for efficient dataset distillation becomes critical. E^2D addresses the trade-off between accuracy and efficiency, making it a valuable contribution for researchers and practitioners aiming to optimize model training while managing resource constraints.
Key Takeaways
- E^2D minimizes redundant computation in dataset distillation.
- The method achieves 18x faster performance on ImageNet-1K while improving accuracy.
- A two-phase optimization strategy enhances convergence by focusing on high-loss regions.
Computer Science > Computer Vision and Pattern Recognition arXiv:2602.15277 (cs) [Submitted on 17 Feb 2026] Title:Accelerating Large-Scale Dataset Distillation via Exploration-Exploitation Optimization Authors:Muhammad J. Alahmadi, Peng Gao, Feiyi Wang, Dongkuan (DK)Xu View a PDF of the paper titled Accelerating Large-Scale Dataset Distillation via Exploration-Exploitation Optimization, by Muhammad J. Alahmadi and 3 other authors View PDF HTML (experimental) Abstract:Dataset distillation compresses the original data into compact synthetic datasets, reducing training time and storage while retaining model performance, enabling deployment under limited resources. Although recent decoupling-based distillation methods enable dataset distillation at large-scale, they continue to face an efficiency gap: optimization-based decoupling methods achieve higher accuracy but demand intensive computation, whereas optimization-free decoupling methods are efficient but sacrifice accuracy. To overcome this trade-off, we propose Exploration-Exploitation Distillation (E^2D), a simple, practical method that minimizes redundant computation through an efficient pipeline that begins with full-image initialization to preserve semantic integrity and feature diversity. It then uses a two-phase optimization strategy: an exploration phase that performs uniform updates and identifies high-loss regions, and an exploitation phase that focuses updates on these regions to accelerate convergence. We evalua...