[2601.21670] Improving Multimodal Learning with Dispersive and Anchoring Regularization
About this article
Abstract page for arXiv paper 2601.21670: Improving Multimodal Learning with Dispersive and Anchoring Regularization
Computer Science > Computer Vision and Pattern Recognition arXiv:2601.21670 (cs) [Submitted on 29 Jan 2026 (v1), last revised 5 Apr 2026 (this version, v2)] Title:Improving Multimodal Learning with Dispersive and Anchoring Regularization Authors:Zixuan Xia, Hao Wang, Pengcheng Weng, Yanyu Qian, Yangxin Xu, William Dan, Fei Wang View a PDF of the paper titled Improving Multimodal Learning with Dispersive and Anchoring Regularization, by Zixuan Xia and 6 other authors View PDF HTML (experimental) Abstract:Multimodal learning aims to integrate complementary information from heterogeneous modalities, yet strong optimization alone does not guaranty well-structured representations. Even under carefully balanced training schemes, multimodal models often exhibit geometric pathologies, including intra-modal representation collapse and sample-level cross-modal inconsistency, which degrade both unimodal robustness and multimodal fusion. We identify representation geometry as a missing control axis in multimodal learning and propose \regName, a lightweight geometry-aware regularization framework. \regName enforces two complementary constraints on intermediate embeddings: an intra-modal dispersive regularization that promotes representation diversity, and an inter-modal anchoring regularization that bounds sample-level cross-modal drift without rigid alignment. The proposed regularizer is plug-and-play, requires no architectural modifications, and is compatible with various training pa...