[2506.09332] InstructPro: Natural Language Guided Ligand-Binding Protein Design
About this article
Abstract page for arXiv paper 2506.09332: InstructPro: Natural Language Guided Ligand-Binding Protein Design
Computer Science > Machine Learning arXiv:2506.09332 (cs) [Submitted on 11 Jun 2025 (v1), last revised 2 Mar 2026 (this version, v3)] Title:InstructPro: Natural Language Guided Ligand-Binding Protein Design Authors:Zhenqiao Song, Ramith Hettiarachchi, Chuan Li, Jianwen Xie, Lei Li View a PDF of the paper titled InstructPro: Natural Language Guided Ligand-Binding Protein Design, by Zhenqiao Song and 4 other authors View PDF HTML (experimental) Abstract:The de novo design of ligand-binding proteins with tailored functions is essential for advancing biotechnology and molecular medicine, yet existing AI approaches are limited by scarce protein-ligand complex data. To circumvent this data bottleneck, we leverage the abundant natural language descriptions characterizing protein-ligand interactions. Here, we introduce InstructPro, a family of generative models that design proteins following the guidance of natural language instructions and ligand formulas. InstructPro produces protein sequences consistent with specified function descriptions and ligand targets. To enable training and evaluation, we develop InstructProBench, a large-scale dataset of 9.6 million (function description, ligand, protein) triples. We train two model variants -- InstructPro-1B and InstructPro-3B -- that substantially outperform strong baselines. InstructPro-1B achieves an AlphaFold3 ipTM of 0.918 and a binding affinity of -8.764 on seen ligands, while maintaining robust performance in a zero-shot settin...