[2603.17246] On the Cone Effect and Modality Gap in Medical

[2603.17246] On the Cone Effect and Modality Gap in Medical Vision-Language Embeddings

arXiv - Machine Learning March 23, 2026 4 min read

About this article

Abstract page for arXiv paper 2603.17246: On the Cone Effect and Modality Gap in Medical Vision-Language Embeddings

Computer Science > Machine Learning arXiv:2603.17246 (cs) [Submitted on 18 Mar 2026 (v1), last revised 20 Mar 2026 (this version, v2)] Title:On the Cone Effect and Modality Gap in Medical Vision-Language Embeddings Authors:David Restrepo, Miguel L Martins, Chenwei Wu, Luis Filipe Nakayama, Diego M Lopez, Stergios Christodoulidis, Maria Vakalopoulou, Enzo Ferrante View a PDF of the paper titled On the Cone Effect and Modality Gap in Medical Vision-Language Embeddings, by David Restrepo and 7 other authors View PDF HTML (experimental) Abstract:Vision-Language Models (VLMs) exhibit a characteristic "cone effect" in which nonlinear encoders map embeddings into highly concentrated regions of the representation space, contributing to cross-modal separation known as the modality gap. While this phenomenon has been widely observed, its practical impact on supervised multimodal learning -- particularly in medical domains -- remains unclear. In this work, we introduce a lightweight post-hoc mechanism that keeps pretrained VLM encoders frozen while continuously controlling cross-modal separation through a single hyperparameter {\lambda}. This enables systematic analysis of how the modality gap affects downstream multimodal performance without expensive retraining. We evaluate generalist (CLIP, SigLIP) and medically specialized (BioMedCLIP, MedSigLIP) models across diverse medical and natural datasets in a supervised multimodal settings. Results consistently show that reducing excessi...

Originally published on March 23, 2026. Curated by AI News.

Llms

[2603.16629] MLLM-based Textual Explanations for Face Comparison

Abstract page for arXiv paper 2603.16629: MLLM-based Textual Explanations for Face Comparison

arXiv - AI · 4 min · 10 minutes ago

Llms

[2603.15159] To See is Not to Master: Teaching LLMs to Use Private Libraries for Code Generation

Abstract page for arXiv paper 2603.15159: To See is Not to Master: Teaching LLMs to Use Private Libraries for Code Generation

arXiv - AI · 4 min · 10 minutes ago

Llms

[2602.08316] SWE Context Bench: A Benchmark for Context Learning in Coding

Abstract page for arXiv paper 2602.08316: SWE Context Bench: A Benchmark for Context Learning in Coding

arXiv - AI · 4 min · 10 minutes ago

Llms

[2601.13227] Insider Knowledge: How Much Can RAG Systems Gain from Evaluation Secrets?

Abstract page for arXiv paper 2601.13227: Insider Knowledge: How Much Can RAG Systems Gain from Evaluation Secrets?

arXiv - AI · 3 min · 10 minutes ago

[2603.17246] On the Cone Effect and Modality Gap in Medical Vision-Language Embeddings

About this article

Related Articles

[2603.16629] MLLM-based Textual Explanations for Face Comparison

[2603.15159] To See is Not to Master: Teaching LLMs to Use Private Libraries for Code Generation

[2602.08316] SWE Context Bench: A Benchmark for Context Learning in Coding

[2601.13227] Insider Knowledge: How Much Can RAG Systems Gain from Evaluation Secrets?

No comments

Stay updated with AI News