[2602.17689] Robust Pre-Training of Medical Vision-and-Language Models with Domain-Invariant Multi-Modal Masked Reconstruction

[2602.17689] Robust Pre-Training of Medical Vision-and-Language Models with Domain-Invariant Multi-Modal Masked Reconstruction

arXiv - Machine Learning 4 min read Article

Summary

This article presents Robust Multi-Modal Masked Reconstruction (Robust-MMR), a novel self-supervised pre-training framework for medical vision-language models that enhances robustness against domain shifts, achieving improved performance across various benchmarks.

Why It Matters

The research addresses a critical gap in the robustness of medical vision-language models, which are essential for accurate clinical reasoning. By improving model performance under varying conditions, this work has significant implications for real-world medical applications, potentially enhancing diagnostic accuracy and patient outcomes.

Key Takeaways

  • Robust-MMR incorporates robustness objectives into pre-training for medical models.
  • The framework shows improved accuracy in cross-domain medical tasks.
  • Domain-invariant representations enhance model reliability for clinical applications.
  • Robust-MMR outperforms existing methods in various medical benchmarks.
  • The study highlights the importance of robustness in AI for healthcare.

Computer Science > Machine Learning arXiv:2602.17689 (cs) [Submitted on 6 Feb 2026] Title:Robust Pre-Training of Medical Vision-and-Language Models with Domain-Invariant Multi-Modal Masked Reconstruction Authors:Melika Filvantorkaman, Mohsen Piri View a PDF of the paper titled Robust Pre-Training of Medical Vision-and-Language Models with Domain-Invariant Multi-Modal Masked Reconstruction, by Melika Filvantorkaman and 1 other authors View PDF Abstract:Medical vision-language models show strong potential for joint reasoning over medical images and clinical text, but their performance often degrades under domain shift caused by variations in imaging devices, acquisition protocols, and reporting styles. Existing multi-modal pre-training methods largely overlook robustness, treating it as a downstream adaptation problem. In this work, we propose Robust Multi-Modal Masked Reconstruction (Robust-MMR), a self-supervised pre-training framework that explicitly incorporates robustness objectives into masked vision-language learning. Robust-MMR integrates asymmetric perturbation-aware masking, domain-consistency regularization, and modality-resilience constraints to encourage domain-invariant representations. We evaluate Robust-MMR on multiple medical vision-language benchmarks, including medical visual question answering (VQA-RAD, SLAKE, VQA-2019), cross-domain image-text classification (MELINDA), and robust image-caption retrieval (ROCO). Robust-MMR achieves 78.9% cross-domain accu...

Related Articles

Llms

I Accidentally Discovered a Security Vulnerability in AI Education — Then Submitted It To a $200K Competition

Last night I was testing Maestro University, the first fully AI-taught university. I walked into their enrollment chatbot and asked it to...

Reddit - Artificial Intelligence · 1 min ·
Llms

Is anyone else concerned with this blatant potential of security / privacy breach?

Recently, when sending a very sensitive email to my brother including my mother’s health information, I wondered what happens if a recipi...

Reddit - Artificial Intelligence · 1 min ·
Llms

An attack class that passes every current LLM filter - no payload, no injection signature, no log trace

https://shapingrooms.com/research I published a paper today on something I've been calling postural manipulation. The short version: ordi...

Reddit - Artificial Intelligence · 1 min ·
Llms

[R] An attack class that passes every current LLM filter - no payload, no injection signature, no log trace

https://shapingrooms.com/research I've been documenting what I'm calling postural manipulation: a specific class of language that install...

Reddit - Machine Learning · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime