[2505.05736] Multimodal Integrated Knowledge Transfer to Large Language Models through Preference Optimization with Biomedical Applications
Summary
The paper introduces MINT, a framework for optimizing large language models (LLMs) using multimodal biomedical data to enhance predictive tasks in healthcare applications.
Why It Matters
This research addresses the challenge of limited high-quality multimodal biomedical data, which hampers the fine-tuning of LLMs for specialized tasks. By leveraging MINT, the study demonstrates significant improvements in predictive accuracy for rare genetic disease prediction and tissue type classification, showcasing the potential of multimodal knowledge transfer in biomedical applications.
Key Takeaways
- MINT aligns unimodal LLMs with multimodal biomedical data through preference optimization.
- The framework shows superior performance in predicting rare genetic diseases and classifying tissue types.
- MINT utilizes an upstream multimodal model to enhance downstream LLM capabilities.
- The Odds Ratio Preference Optimization (ORPO) is central to MINT's effectiveness.
- This approach addresses the scarcity of high-quality multimodal data in biomedical fields.
Quantitative Biology > Quantitative Methods arXiv:2505.05736 (q-bio) [Submitted on 9 May 2025 (v1), last revised 16 Feb 2026 (this version, v2)] Title:Multimodal Integrated Knowledge Transfer to Large Language Models through Preference Optimization with Biomedical Applications Authors:Zhanliang Wang, Da Wu, Quan Nguyen, Zhuoran Xu, Kai Wang View a PDF of the paper titled Multimodal Integrated Knowledge Transfer to Large Language Models through Preference Optimization with Biomedical Applications, by Zhanliang Wang and 4 other authors View PDF Abstract:The scarcity of high-quality multimodal biomedical data limits the ability to effectively fine-tune pretrained Large Language Models (LLMs) for specialized biomedical tasks. To address this challenge, we introduce MINT (Multimodal Integrated kNowledge Transfer), a framework that aligns unimodal large decoder models with domain-specific decision patterns from multimodal biomedical data through preference optimization. While MINT supports different optimization techniques, we primarily implement it with the Odds Ratio Preference Optimization (ORPO) framework as its backbone. This strategy enables the aligned LLMs to perform predictive tasks using text-only or image-only inputs while retaining knowledge learnt from multimodal data. MINT leverages an upstream multimodal machine learning (MML) model trained on high-quality multimodal data to transfer domain-specific insights to downstream text-only or image-only LLMs. We demonstra...