[2602.12705] MedXIAOHE: A Comprehensive Recipe for Building Medical MLLMs
Summary
MedXIAOHE is a medical vision-language foundation model that enhances medical understanding and reasoning in clinical applications, achieving state-of-the-art performance across various benchmarks.
Why It Matters
This research is significant as it addresses the need for advanced medical AI systems that can handle complex reasoning and decision-making in clinical settings. By integrating diverse training methodologies and focusing on reliability, MedXIAOHE aims to improve patient outcomes and support healthcare professionals.
Key Takeaways
- MedXIAOHE achieves state-of-the-art performance in medical benchmarks.
- The model utilizes an entity-aware continual pretraining framework to enhance knowledge coverage.
- Incorporates reinforcement learning for improved medical reasoning and interaction.
- Focuses on evidence-grounded reasoning and low-hallucination report generation.
- Aims to inspire further research in medical AI applications.
Computer Science > Computation and Language arXiv:2602.12705 (cs) [Submitted on 13 Feb 2026] Title:MedXIAOHE: A Comprehensive Recipe for Building Medical MLLMs Authors:Baorong Shi, Bo Cui, Boyuan Jiang, Deli Yu, Fang Qian, Haihua Yang, Huichao Wang, Jiale Chen, Jianfei Pan, Jieqiong Cao, Jinghao Lin, Kai Wu, Lin Yang, Shengsheng Yao, Tao Chen, Xiaojun Xiao, Xiaozhong Ji, Xu Wang, Yijun He, Zhixiong Yang View a PDF of the paper titled MedXIAOHE: A Comprehensive Recipe for Building Medical MLLMs, by Baorong Shi and 19 other authors View PDF HTML (experimental) Abstract:We present MedXIAOHE, a medical vision-language foundation model designed to advance general-purpose medical understanding and reasoning in real-world clinical applications. MedXIAOHE achieves state-of-the-art performance across diverse medical benchmarks and surpasses leading closed-source multimodal systems on multiple capabilities. To achieve this, we propose an entity-aware continual pretraining framework that organizes heterogeneous medical corpora to broaden knowledge coverage and reduce long-tail gaps (e.g., rare diseases). For medical expert-level reasoning and interaction, MedXIAOHE incorporates diverse medical reasoning patterns via reinforcement learning and tool-augmented agentic training, enabling multi-step diagnostic reasoning with verifiable decision traces. To improve reliability in real-world use, MedXIAOHE integrates user-preference rubrics, evidence-grounded reasoning, and low-hallucination...