[2602.23666] Active Learning for Planet Habitability Classification under Extreme Class Imbalance
About this article
Abstract page for arXiv paper 2602.23666: Active Learning for Planet Habitability Classification under Extreme Class Imbalance
Astrophysics > Earth and Planetary Astrophysics arXiv:2602.23666 (astro-ph) [Submitted on 27 Feb 2026] Title:Active Learning for Planet Habitability Classification under Extreme Class Imbalance Authors:R. I. El-Kholy, Z. M. Hayman View a PDF of the paper titled Active Learning for Planet Habitability Classification under Extreme Class Imbalance, by R. I. El-Kholy and Z. M. Hayman View PDF HTML (experimental) Abstract:The increasing size and heterogeneity of exoplanet catalogs have made systematic habitability assessment challenging, particularly given the extreme scarcity of potentially habitable planets and the evolving nature of their labels. In this study, we explore the use of pool-based active learning to improve the efficiency of habitability classification under realistic observational constraints. We construct a unified dataset from the Habitable World Catalog and the NASA Exoplanet Archive and formulate habitability assessment as a binary classification problem. A supervised baseline based on gradient-boosted decision trees is established and optimized for recall in order to prioritize the identification of rare potentially habitable planets. This model is then embedded within an active learning framework, where uncertainty-based margin sampling is compared against random querying across multiple runs and labeling budgets. We find that active learning substantially reduces the number of labeled instances required to approach supervised performance, demonstrating c...