Machine Learning Ai Safety

[2501.10466] Efficient Semi-Supervised Adversarial Training via Latent Clustering-Based Data Reduction

arXiv - AI February 18, 2026 4 min read Article

Summary

This paper presents a novel approach to enhance semi-supervised adversarial training (SSAT) by employing latent clustering-based data reduction techniques, significantly improving efficiency while maintaining model robustness.

Why It Matters

As adversarial training becomes increasingly important for developing robust machine learning models, this research addresses the challenges of data efficiency and computational costs associated with SSAT. By optimizing data usage, it offers a pathway to more practical implementations in real-world applications, which is crucial for advancing AI safety and performance.

Key Takeaways

Introduces data reduction strategies to optimize semi-supervised adversarial training.
Utilizes latent clustering techniques to select critical data samples near decision boundaries.
Achieves robust model performance with significantly less unlabeled data.
Reduces training time by 3 to 4 times compared to traditional SSAT methods.
Demonstrates effectiveness through comprehensive experiments across image benchmarks.

Computer Science > Machine Learning arXiv:2501.10466 (cs) [Submitted on 15 Jan 2025 (v1), last revised 17 Feb 2026 (this version, v3)] Title:Efficient Semi-Supervised Adversarial Training via Latent Clustering-Based Data Reduction Authors:Somrita Ghosh, Yuelin Xu, Xiao Zhang View a PDF of the paper titled Efficient Semi-Supervised Adversarial Training via Latent Clustering-Based Data Reduction, by Somrita Ghosh and 2 other authors View PDF HTML (experimental) Abstract:Learning robust models under adversarial settings is widely recognized as requiring a considerably large number of training samples. Recent work proposes semi-supervised adversarial training (SSAT), which utilizes external unlabeled or synthetically generated data and is currently the state of the art. However, SSAT requires substantial extra data to attain high robustness, resulting in prolonged training time and increased memory usage. In this paper, we propose data reduction strategies to improve the efficiency of SSAT by optimizing the amount of additional data incorporated. Specifically, we design novel latent clustering-based techniques to select or generate a small, critical subset of data samples near the model's decision boundary. While focusing on boundary-adjacent points, our methods maintain a balanced ratio between boundary and non-boundary data points, thereby avoiding overfitting. Comprehensive experiments across image benchmarks demonstrate that our methods can effectively reduce SSAT's data r...

Read Original Article