[2602.19912] De novo molecular structure elucidation from mass spectra via flow matching
Summary
This article presents MSFlow, a novel generative model for de novo molecular structure elucidation from mass spectra, significantly improving accuracy in translating spectra into molecular representations.
Why It Matters
The ability to accurately interpret mass spectra into molecular structures is vital for advancements in biological research and chemical discovery. MSFlow's state-of-the-art performance represents a significant leap in this field, facilitating better insights into complex biological systems and aiding in the discovery of new metabolites.
Key Takeaways
- MSFlow achieves a 45% accuracy in translating mass spectra to molecular structures.
- The model employs a two-stage encoder-decoder approach for effective structure elucidation.
- A formula-restricted transformer model is used for encoding mass spectra into informative embeddings.
- The research demonstrates a fourteen-fold improvement over previous methods.
- A trained version of MSFlow is publicly available for non-commercial use.
Computer Science > Machine Learning arXiv:2602.19912 (cs) [Submitted on 23 Feb 2026] Title:De novo molecular structure elucidation from mass spectra via flow matching Authors:Ghaith Mqawass (1,2), Tuan Le (2), Fabian Theis (1,3,4), Djork-Arné Clevert (2) ((1) TUM School of Life Sciences Weihenstephan, Technical University of Munich, Germany, (2) Machine Learning and Computational Sciences, Pfizer Research & Development, Berlin, Germany, (3) TUM School of Computation, Information and Technology, Technical University of Munich, Germany, (4) Institute of Computational Biology, Helmholtz Center Munich, Germany) View a PDF of the paper titled De novo molecular structure elucidation from mass spectra via flow matching, by Ghaith Mqawass (1 and 19 other authors View PDF HTML (experimental) Abstract:Mass spectrometry is a powerful and widely used tool for identifying molecular structures due to its sensitivity and ability to profile complex samples. However, translating spectra into full molecular structures is a difficult, under-defined inverse problem. Overcoming this problem is crucial for enabling biological insight, discovering new metabolites, and advancing chemical research across multiple fields. To this end, we develop MSFlow, a two-stage encoder-decoder flow-matching generative model that achieves state-of-the-art performance on the structure elucidation task for small molecules. In the first stage, we adopt a formula-restricted transformer model for encoding mass spectr...