[2602.15236] BindCLIP: A Unified Contrastive-Generative Representation Learning Framework for Virtual Screening
Summary
BindCLIP introduces a novel framework for virtual screening, enhancing ligand identification through a unified contrastive-generative learning approach that improves binding interaction accuracy.
Why It Matters
This research addresses limitations in existing virtual screening methods, which often overlook fine-grained binding interactions. By integrating generative and contrastive learning, BindCLIP enhances the accuracy of ligand ranking, making it a significant advancement for drug discovery and molecular biology applications.
Key Takeaways
- BindCLIP improves ligand identification by integrating contrastive and generative learning.
- The framework mitigates reliance on shortcut correlations in training data.
- Experiments show substantial gains in out-of-distribution virtual screening.
- Pose-level supervision enhances the retrieval embedding space for better accuracy.
- Hard-negative augmentation and anchoring regularizers prevent representation collapse.
Computer Science > Machine Learning arXiv:2602.15236 (cs) [Submitted on 16 Feb 2026] Title:BindCLIP: A Unified Contrastive-Generative Representation Learning Framework for Virtual Screening Authors:Anjie Qiao, Zhen Wang, Yaliang Li, Jiahua Rao, Yuedong Yang View a PDF of the paper titled BindCLIP: A Unified Contrastive-Generative Representation Learning Framework for Virtual Screening, by Anjie Qiao and 4 other authors View PDF HTML (experimental) Abstract:Virtual screening aims to efficiently identify active ligands from massive chemical libraries for a given target pocket. Recent CLIP-style models such as DrugCLIP enable scalable virtual screening by embedding pockets and ligands into a shared space. However, our analyses indicate that such representations can be insensitive to fine-grained binding interactions and may rely on shortcut correlations in training data, limiting their ability to rank ligands by true binding compatibility. To address these issues, we propose BindCLIP, a unified contrastive-generative representation learning framework for virtual screening. BindCLIP jointly trains pocket and ligand encoders using CLIP-style contrastive learning together with a pocket-conditioned diffusion objective for binding pose generation, so that pose-level supervision directly shapes the retrieval embedding space toward interaction-relevant features. To further mitigate shortcut reliance, we introduce hard-negative augmentation and a ligand-ligand anchoring regularizer t...