[2507.00407] Augmenting Molecular Graphs with Geometries via Machine Learning Interatomic Potentials
Summary
This article discusses a novel approach to predicting molecular geometries using machine learning interatomic potentials, improving molecular property predictions without relying solely on traditional methods like density functional theory.
Why It Matters
The research addresses the challenge of accurately predicting molecular geometries, which is crucial for various applications in chemistry and materials science. By leveraging machine learning, the study offers a potentially more efficient and scalable method for molecular modeling, which could significantly impact computational chemistry practices.
Key Takeaways
- Machine learning interatomic potential models can predict molecular geometries.
- A large-scale dataset of 3.5 million molecules was curated for training.
- Geometry optimization using MLIP models can enhance downstream property predictions.
- Fine-tuning based on relaxed geometries mitigates biases in predictions.
- The approach offers a practical alternative to traditional density functional theory methods.
Physics > Chemical Physics arXiv:2507.00407 (physics) [Submitted on 1 Jul 2025 (v1), last revised 24 Feb 2026 (this version, v2)] Title:Augmenting Molecular Graphs with Geometries via Machine Learning Interatomic Potentials Authors:Cong Fu, Yuchao Lin, Zachary Krueger, Haiyang Yu, Maho Nakata, Jianwen Xie, Emine Kucukbenli, Xiaofeng Qian, Shuiwang Ji View a PDF of the paper titled Augmenting Molecular Graphs with Geometries via Machine Learning Interatomic Potentials, by Cong Fu and 8 other authors View PDF HTML (experimental) Abstract:Accurate molecular property predictions require 3D geometries, which are typically obtained using expensive methods such as density functional theory (DFT). Here, we attempt to obtain molecular geometries by relying solely on machine learning interatomic potential (MLIP) models. To this end, we first curate a large-scale molecular relaxation dataset comprising 3.5 million molecules and 300 million snapshots. Then MLIP pre-trained models are trained with supervised learning to predict energy and forces given 3D molecular structures. Once trained, we show that the pre-trained models can be used in different ways to obtain geometries either explicitly or implicitly. First, it can be used to obtain approximate low-energy 3D geometries via geometry optimization. While these geometries do not consistently reach DFT-level chemical accuracy or convergence, they can still improve downstream performance compared to non-relaxed structures. To mitigate ...