[2602.20344] Hierarchical Molecular Representation Learning via Fragment-Based Self-Supervised Embedding Prediction
Summary
This article presents GraSPNet, a novel hierarchical self-supervised learning framework for molecular representation that enhances graph embeddings by focusing on chemically relevant fragments.
Why It Matters
The research addresses the limitations of existing graph self-supervised learning methods by incorporating fragment-level semantics, which is crucial for accurately predicting molecular properties. This advancement could significantly impact fields like drug discovery and materials science, where understanding molecular structures is essential.
Key Takeaways
- GraSPNet models both atomic and fragment-level semantics for molecular graphs.
- The framework utilizes multi-level message passing and masked semantic prediction.
- Extensive experiments show GraSPNet outperforms existing GSSL methods in transfer learning settings.
- The approach enables learning of expressive and transferable molecular representations.
- This research could enhance applications in drug discovery and molecular analysis.
Computer Science > Machine Learning arXiv:2602.20344 (cs) [Submitted on 23 Feb 2026] Title:Hierarchical Molecular Representation Learning via Fragment-Based Self-Supervised Embedding Prediction Authors:Jiele Wu, Haozhe Ma, Zhihan Guo, Thanh Vinh Vo, Tze Yun Leong View a PDF of the paper titled Hierarchical Molecular Representation Learning via Fragment-Based Self-Supervised Embedding Prediction, by Jiele Wu and 4 other authors View PDF HTML (experimental) Abstract:Graph self-supervised learning (GSSL) has demonstrated strong potential for generating expressive graph embeddings without the need for human annotations, making it particularly valuable in domains with high labeling costs such as molecular graph analysis. However, existing GSSL methods mostly focus on node- or edge-level information, often ignoring chemically relevant substructures which strongly influence molecular properties. In this work, we propose Graph Semantic Predictive Network (GraSPNet), a hierarchical self-supervised framework that explicitly models both atomic-level and fragment-level semantics. GraSPNet decomposes molecular graphs into chemically meaningful fragments without predefined vocabularies and learns node- and fragment-level representations through multi-level message passing with masked semantic prediction at both levels. This hierarchical semantic supervision enables GraSPNet to learn multi-resolution structural information that is both expressive and transferable. Extensive experiments o...