[2602.17709] UBio-MolFM: A Universal Molecular Foundation Model for Bio-Systems
Summary
UBio-MolFM presents a universal molecular foundation model designed to enhance all-atom molecular simulations, bridging the gap between quantum accuracy and biological scale.
Why It Matters
This research addresses the limitations of current molecular simulations, which struggle to balance accuracy and scalability. By introducing UBio-MolFM, the authors provide a tool that could significantly advance computational biology, enabling more precise modeling of complex biological systems.
Key Takeaways
- UBio-MolFM integrates a large bio-specific dataset for improved molecular modeling.
- The E2Former-V2 transformer enhances inference throughput and captures non-local physics.
- A Three-Stage Curriculum Learning protocol improves energy-force consistency in simulations.
- The model achieves ab initio-level fidelity for large biomolecular systems.
- This framework is poised to advance the field of computational biology significantly.
Physics > Chemical Physics arXiv:2602.17709 (physics) [Submitted on 13 Feb 2026] Title:UBio-MolFM: A Universal Molecular Foundation Model for Bio-Systems Authors:Lin Huang, Arthur Jiang, XiaoLi Liu, Zion Wang, Jason Zhao, Chu Wang, HaoCheng Lu, ChengXiang Huang, JiaJun Cheng, YiYue Du, Jia Zhang View a PDF of the paper titled UBio-MolFM: A Universal Molecular Foundation Model for Bio-Systems, by Lin Huang and 9 other authors View PDF HTML (experimental) Abstract:All-atom molecular simulation serves as a quintessential ``computational microscope'' for understanding the machinery of life, yet it remains fundamentally limited by the trade-off between quantum-mechanical (QM) accuracy and biological scale. We present UBio-MolFM, a universal foundation model framework specifically engineered to bridge this gap. UBio-MolFM introduces three synergistic innovations: (1) UBio-Mol26, a large bio-specific dataset constructed via a multi-fidelity ``Two-Pronged Strategy'' that combines systematic bottom-up enumeration with top-down sampling of native protein environments (up to 1,200 atoms); (2) E2Former-V2, a linear-scaling equivariant transformer that integrates Equivariant Axis-Aligned Sparsification (EAAS) and Long-Short Range (LSR) modeling to capture non-local physics with up to ~4x higher inference throughput in our large-system benchmarks; and (3) a Three-Stage Curriculum Learning protocol that transitions from energy initialization to energy-force consistency, with force-focuse...