[2602.17680] BioBridge: Bridging Proteins and Language for Enhanced Biological Reasoning with LLMs
Summary
BioBridge introduces a novel framework that enhances biological reasoning by integrating protein language models with large language models, improving adaptability and generalization across tasks.
Why It Matters
The development of BioBridge is significant as it addresses the limitations of existing protein language models and large language models in biological contexts. By combining their strengths, it opens new avenues for research and applications in bioinformatics, potentially leading to advancements in protein property prediction and knowledge extraction.
Key Takeaways
- BioBridge enhances protein understanding by integrating domain-specific knowledge with general reasoning capabilities.
- The framework employs Domain-Incremental Continual Pre-training to mitigate catastrophic forgetting.
- BioBridge achieves competitive performance on protein benchmarks and general understanding tasks.
- Cross-modal alignment is facilitated through a PLM-Projector-LLM pipeline.
- The end-to-end optimization supports various tasks, including protein property prediction.
Computer Science > Machine Learning arXiv:2602.17680 (cs) [Submitted on 4 Feb 2026] Title:BioBridge: Bridging Proteins and Language for Enhanced Biological Reasoning with LLMs Authors:Yujia Wang, Jihong Guan, Wengen Li, Shuigeng Zhou, Xuhong Wang View a PDF of the paper titled BioBridge: Bridging Proteins and Language for Enhanced Biological Reasoning with LLMs, by Yujia Wang and 4 other authors View PDF HTML (experimental) Abstract:Existing Protein Language Models (PLMs) often suffer from limited adaptability to multiple tasks and exhibit poor generalization across diverse biological contexts. In contrast, general-purpose Large Language Models (LLMs) lack the capability to interpret protein sequences and fall short in domain-specific knowledge, limiting their capacity for effective biosemantic reasoning. To combine the advantages of both, we propose BioBridge, a domain-adaptive continual pretraining framework for protein understanding. This framework employs Domain-Incremental Continual Pre-training (DICP) to infuse protein domain knowledge and general reasoning corpus into a LLM simultaneously, effectively mitigating catastrophic forgetting. Cross-modal alignment is achieved via a PLM-Projector-LLM pipeline, which maps protein sequence embeddings into the semantic space of the language model. Ultimately, an end-to-end optimization is adopted to uniformly support various tasks, including protein property prediction and knowledge question-answering. Our proposed BioBridge ...