[2508.04724] Understanding protein function with a multimodal retrieval-augmented foundation model
Summary
This article presents PoET-2, a multimodal retrieval-augmented protein foundation model that enhances protein function prediction and variant effect scoring through advanced modeling techniques.
Why It Matters
Understanding protein function is crucial for advancements in biotechnology and medicine. PoET-2's innovative approach combines retrieval augmentation with family-specific modeling, potentially transforming how researchers predict protein behavior and design new proteins, which could lead to breakthroughs in drug development and genetic engineering.
Key Takeaways
- PoET-2 achieves state-of-the-art performance in zero-shot variant effect prediction.
- The model integrates multimodal learning with evolutionary constraints for improved protein function understanding.
- It excels in scoring variants with multiple mutations and indel mutations.
- PoET-2's embeddings outperform previous methods, especially with limited datasets.
- The approach highlights the benefits of combining retrieval augmentation with family-centric modeling.
Quantitative Biology > Quantitative Methods arXiv:2508.04724 (q-bio) [Submitted on 5 Aug 2025 (v1), last revised 26 Feb 2026 (this version, v2)] Title:Understanding protein function with a multimodal retrieval-augmented foundation model Authors:Timothy Fei Truong Jr, Tristan Bepler View a PDF of the paper titled Understanding protein function with a multimodal retrieval-augmented foundation model, by Timothy Fei Truong Jr and 1 other authors View PDF Abstract:Protein language models (PLMs) learn probability distributions over natural protein sequences. By learning from hundreds of millions of natural protein sequences, protein understanding and design capabilities emerge. Recent works have shown that scaling these models improves structure prediction, but does not seem to improve mutation understanding and representation quality for protein function prediction. We introduce PoET-2, a multimodal, retrieval-augmented protein foundation model that incorporates in-context learning of family-specific evolutionary constraints with optional structure conditioning to learn generative distributions over protein sequences. PoET-2 uses a hierarchical transformer encoder that is equivariant to sequence context ordering and a dual decoder architecture with both causal and masked language modeling objectives, allowing PoET-2 to operate in both fully generative and bidirectional representation learning modes. PoET-2 achieves state-of-the-art performance on zero-shot variant effect predic...