[2502.05114] SpecTUS: Spectral Translator for Unknown Structures annotation from EI-MS spectra
Summary
The article presents SpecTUS, a deep neural model designed for the structural annotation of small molecules from low-resolution gas chromatography electron ionization mass spectra (GC-EI-MS), outperforming traditional database search methods.
Why It Matters
SpecTUS addresses a critical need in compound identification across various fields, such as drug detection and forensics, by providing a more effective method for analyzing unknown compounds. This advancement could significantly enhance research and practical applications in chemistry and related domains.
Key Takeaways
- SpecTUS offers a novel approach for structural annotation of small molecules from mass spectra.
- The model outperforms standard database search techniques, achieving a 43% perfect reconstruction rate.
- In scenarios with multiple suggestions, SpecTUS improves accuracy significantly over traditional methods.
- The model is particularly beneficial for analyzing compounds not available in existing spectral libraries.
- This advancement could impact various fields, including drug discovery and forensic science.
Computer Science > Machine Learning arXiv:2502.05114 (cs) [Submitted on 7 Feb 2025 (v1), last revised 20 Feb 2026 (this version, v2)] Title:SpecTUS: Spectral Translator for Unknown Structures annotation from EI-MS spectra Authors:Adam Hájek, Michal Starý, Elliott Price, Filip Jozefov, Helge Hecht, Aleš Křenek View a PDF of the paper titled SpecTUS: Spectral Translator for Unknown Structures annotation from EI-MS spectra, by Adam H\'ajek and 5 other authors View PDF HTML (experimental) Abstract:Compound identification and structure annotation from mass spectra is a well-established task widely applied in drug detection, criminal forensics, small molecule biomarker discovery and chemical engineering. We propose SpecTUS: Spectral Translator for Unknown Structures, a deep neural model that addresses the task of structural annotation of small molecules from low-resolution gas chromatography electron ionization mass spectra (GC-EI-MS). Our model analyzes the spectra in \textit{de novo} manner -- a direct translation from the spectra into 2D-structural representation. Our approach is particularly useful for analyzing compounds unavailable in spectral libraries. In a rigorous evaluation of our model on the novel structure annotation task across different libraries, we outperformed standard database search techniques by a wide margin. On a held-out testing set, including \numprint{28267} spectra from the NIST database, we show that our model's single suggestion perfectly reconstruc...