[2511.21331] The More, the Merrier: Contrastive Fusion for

[2511.21331] The More, the Merrier: Contrastive Fusion for Higher-Order Multimodal Alignment

arXiv - AI April 06, 2026 4 min read

About this article

Abstract page for arXiv paper 2511.21331: The More, the Merrier: Contrastive Fusion for Higher-Order Multimodal Alignment

Computer Science > Computer Vision and Pattern Recognition arXiv:2511.21331 (cs) [Submitted on 26 Nov 2025 (v1), last revised 3 Apr 2026 (this version, v2)] Title:The More, the Merrier: Contrastive Fusion for Higher-Order Multimodal Alignment Authors:Stefanos Koutoupis, Michaela Areti Zervou, Konstantinos Kontras, Maarten De Vos, Panagiotis Tsakalides, Grigorios Tsagkatakis View a PDF of the paper titled The More, the Merrier: Contrastive Fusion for Higher-Order Multimodal Alignment, by Stefanos Koutoupis and 5 other authors View PDF HTML (experimental) Abstract:Learning joint representations across multiple modalities remains a central challenge in multimodal machine learning. Prevailing approaches predominantly operate in pairwise settings, aligning two modalities at a time. While some recent methods aim to capture higher-order interactions among multiple modalities, they often overlook or insufficiently preserve pairwise relationships, limiting their effectiveness on single-modality tasks. In this work, we introduce Contrastive Fusion (ConFu), a framework that jointly embeds both individual modalities and their fused combinations into a unified representation space, where modalities and their fused counterparts are aligned. ConFu extends traditional pairwise contrastive objectives with an additional fused-modality contrastive term, encouraging the joint embedding of modality pairs with a third modality. This formulation enables ConFu to capture higher-order dependencies...

Originally published on April 06, 2026. Curated by AI News.

Ai Startups

Top 10 AI certifications and courses for 2026

This article reviews the top 10 AI certifications and courses for 2026, highlighting their significance in a rapidly evolving field and t...

AI Events · 15 min · about 2 hours ago

Llms

[2604.01989] Attention at Rest Stays at Rest: Breaking Visual Inertia for Cognitive Hallucination Mitigation

Abstract page for arXiv paper 2604.01989: Attention at Rest Stays at Rest: Breaking Visual Inertia for Cognitive Hallucination Mitigation

arXiv - AI · 4 min · about 3 hours ago

Machine Learning

[2604.01447] Better Rigs, Not Bigger Networks: A Body Model Ablation for Gaussian Avatars

Abstract page for arXiv paper 2604.01447: Better Rigs, Not Bigger Networks: A Body Model Ablation for Gaussian Avatars

arXiv - AI · 3 min · about 3 hours ago

Llms

[2603.24326] Boosting Document Parsing Efficiency and Performance with Coarse-to-Fine Visual Processing

Abstract page for arXiv paper 2603.24326: Boosting Document Parsing Efficiency and Performance with Coarse-to-Fine Visual Processing