[2507.12784] A Semi-Supervised Learning Method for the Identification of Bad Exposures in Large Imaging Surveys
Summary
This article presents a semi-supervised learning method to identify poor-quality exposures in large astronomical imaging surveys, enhancing data quality control.
Why It Matters
As astronomical imaging surveys grow in size, traditional quality control methods become impractical. This research introduces a scalable machine-learning approach that can significantly improve the efficiency of identifying bad exposures, which is crucial for maintaining data integrity in astrophysics.
Key Takeaways
- Introduces a semi-supervised learning pipeline for detecting bad exposures in imaging surveys.
- Utilizes a vision transformer (ViT) and k-Nearest Neighbor (kNN) classifier for effective anomaly detection.
- Demonstrates the method's application on the DECam Legacy Survey, identifying 780 problematic exposures.
- Offers a scalable solution for quality control applicable to other large imaging datasets.
- Highlights the importance of machine learning in handling increasing data volumes in astrophysics.
Astrophysics > Instrumentation and Methods for Astrophysics arXiv:2507.12784 (astro-ph) [Submitted on 17 Jul 2025 (v1), last revised 25 Feb 2026 (this version, v2)] Title:A Semi-Supervised Learning Method for the Identification of Bad Exposures in Large Imaging Surveys Authors:Yufeng Luo, Adam D. Myers, Alex Drlica-Wagner, Dario Dematties, Salma Borchani, Francisco Valdes, Arjun Dey, David Schlegel, Rongpu Zhou, DESI Legacy Imaging Surveys Team View a PDF of the paper titled A Semi-Supervised Learning Method for the Identification of Bad Exposures in Large Imaging Surveys, by Yufeng Luo and 9 other authors View PDF HTML (experimental) Abstract:As the data volume of astronomical imaging surveys rapidly increases, traditional methods for image anomaly detection, such as visual inspection by human experts, are becoming impractical. We introduce a machine-learning-based approach to detect poor-quality exposures in large imaging surveys, with a focus on the DECam Legacy Survey (DECaLS) in regions of low extinction (i.e., $E(B-V)<0.04$). Our semi-supervised pipeline integrates a vision transformer (ViT), trained via self-supervised learning (SSL), with a k-Nearest Neighbor (kNN) classifier. We train and validate our pipeline using a small set of labeled exposures observed by surveys with the Dark Energy Camera (DECam). A clustering-space analysis of where our pipeline places images labeled in ``good'' and ``bad'' categories suggests that our approach can efficiently and accurate...