[2510.20095] BioCAP: Exploiting Synthetic Captions Beyond Labels in Biological Foundation Models
About this article
Abstract page for arXiv paper 2510.20095: BioCAP: Exploiting Synthetic Captions Beyond Labels in Biological Foundation Models
Computer Science > Computer Vision and Pattern Recognition arXiv:2510.20095 (cs) [Submitted on 23 Oct 2025 (v1), last revised 1 Mar 2026 (this version, v3)] Title:BioCAP: Exploiting Synthetic Captions Beyond Labels in Biological Foundation Models Authors:Ziheng Zhang, Xinyue Ma, Arpita Chowdhury, Elizabeth G. Campolongo, Matthew J. Thompson, Net Zhang, Samuel Stevens, Hilmar Lapp, Tanya Berger-Wolf, Yu Su, Wei-Lun Chao, Jianyang Gu View a PDF of the paper titled BioCAP: Exploiting Synthetic Captions Beyond Labels in Biological Foundation Models, by Ziheng Zhang and 11 other authors View PDF HTML (experimental) Abstract:This work investigates descriptive captions as an additional source of supervision for biological multimodal foundation models. Images and captions can be viewed as complementary samples from the latent morphospace of a species, each capturing certain biological traits. Incorporating captions during training encourages alignment with this shared latent structure, emphasizing potentially diagnostic characters while suppressing spurious correlations. The main challenge, however, lies in obtaining faithful, instance-specific captions at scale. This requirement has limited the utilization of natural language supervision in organismal biology compared with many other scientific domains. We complement this gap by generating synthetic captions with multimodal large language models (MLLMs), guided by Wikipedia-derived visual information and taxon-tailored format exa...