[2601.09768] CLiMB: A Domain-Informed Novelty Detection Clustering Framework for Galactic Archaeology and Scientific Discovery
Summary
The paper presents CLiMB, a novel framework for novelty detection in galactic archaeology, enhancing clustering methods to identify unknown structures in astrophysical data.
Why It Matters
This research addresses the limitations of current clustering algorithms in astrophysics, enabling better identification of novel astronomical phenomena. By improving data efficiency and accuracy in classifying celestial objects, it contributes to advancements in scientific discovery and our understanding of the universe.
Key Takeaways
- CLiMB decouples prior knowledge exploitation from unknown structure exploration.
- It achieves a high Adjusted Rand Index of 0.829, outperforming existing methods.
- The framework demonstrates superior data efficiency with improved performance as knowledge increases.
- CLiMB successfully isolates distinct dynamical features in unlabelled data.
- This approach has significant implications for future discoveries in galactic archaeology.
Astrophysics > Instrumentation and Methods for Astrophysics arXiv:2601.09768 (astro-ph) [Submitted on 14 Jan 2026 (v1), last revised 24 Feb 2026 (this version, v2)] Title:CLiMB: A Domain-Informed Novelty Detection Clustering Framework for Galactic Archaeology and Scientific Discovery Authors:Lorenzo Monti, Tatiana Muraveva, Brian Sheridan, Davide Massari, Alessia Garofalo, Gisella Clementini, Umberto Michelucci View a PDF of the paper titled CLiMB: A Domain-Informed Novelty Detection Clustering Framework for Galactic Archaeology and Scientific Discovery, by Lorenzo Monti and 6 other authors View PDF HTML (experimental) Abstract:In data-driven scientific discovery, a challenge lies in classifying well-characterized phenomena while identifying novel anomalies. Current semi-supervised clustering algorithms do not always fully address this duality, often assuming that supervisory signals are globally representative. Consequently, methods often enforce rigid constraints that suppress unanticipated patterns or require a pre-specified number of clusters, rendering them ineffective for genuine novelty detection. To bridge this gap, we introduce CLiMB (CLustering in Multiphase Boundaries), a domain-informed framework decoupling the exploitation of prior knowledge from the exploration of unknown structures. Using a sequential two-phase approach, CLiMB first anchors known clusters using metric-adaptive constrained partitioning, and subsequently applies density-based clustering to res...