[2511.06450] Countering Multi-modal Representation Collapse through Rank-targeted Fusion

[2511.06450] Countering Multi-modal Representation Collapse through Rank-targeted Fusion

arXiv - Machine Learning 4 min read Article

Summary

This paper presents a novel framework, Rank-enhancing Token Fuser, to address multi-modal representation collapse in machine learning, enhancing feature and modality integration for improved action anticipation.

Why It Matters

As multi-modal systems become increasingly prevalent in applications like action recognition, addressing representation collapse is crucial for improving model performance. This research provides a unified approach to enhance the effectiveness of multi-modal fusion, which can lead to better outcomes in various AI applications.

Key Takeaways

  • Introduces Rank-enhancing Token Fuser to counter representation collapse.
  • Demonstrates how effective rank can quantify and mitigate feature and modality collapse.
  • Validates the approach through extensive experiments, outperforming existing methods by up to 3.74%.

Computer Science > Computer Vision and Pattern Recognition arXiv:2511.06450 (cs) [Submitted on 9 Nov 2025 (v1), last revised 23 Feb 2026 (this version, v2)] Title:Countering Multi-modal Representation Collapse through Rank-targeted Fusion Authors:Seulgi Kim, Kiran Kokilepersaud, Mohit Prabhushankar, Ghassan AlRegib View a PDF of the paper titled Countering Multi-modal Representation Collapse through Rank-targeted Fusion, by Seulgi Kim and 3 other authors View PDF HTML (experimental) Abstract:Multi-modal fusion methods often suffer from two types of representation collapse: feature collapse where individual dimensions lose their discriminative power (as measured by eigenspectra), and modality collapse where one dominant modality overwhelms the other. Applications like human action anticipation that require fusing multifarious sensor data are hindered by both feature and modality collapse. However, existing methods attempt to counter feature collapse and modality collapse separately. This is because there is no unifying framework that efficiently addresses feature and modality collapse in conjunction. In this paper, we posit the utility of effective rank as an informative measure that can be utilized to quantify and counter both the representation collapses. We propose \textit{Rank-enhancing Token Fuser}, a theoretically grounded fusion framework that selectively blends less informative features from one modality with complementary features from another modality. We show tha...

Related Articles

UMKC Announces New Master of Science in Artificial Intelligence
Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min ·
[2603.23899] SM-Net: Learning a Continuous Spectral Manifold from Multiple Stellar Libraries
Machine Learning

[2603.23899] SM-Net: Learning a Continuous Spectral Manifold from Multiple Stellar Libraries

Abstract page for arXiv paper 2603.23899: SM-Net: Learning a Continuous Spectral Manifold from Multiple Stellar Libraries

arXiv - AI · 4 min ·
[2603.16629] MLLM-based Textual Explanations for Face Comparison
Llms

[2603.16629] MLLM-based Textual Explanations for Face Comparison

Abstract page for arXiv paper 2603.16629: MLLM-based Textual Explanations for Face Comparison

arXiv - AI · 4 min ·
[2603.15159] To See is Not to Master: Teaching LLMs to Use Private Libraries for Code Generation
Llms

[2603.15159] To See is Not to Master: Teaching LLMs to Use Private Libraries for Code Generation

Abstract page for arXiv paper 2603.15159: To See is Not to Master: Teaching LLMs to Use Private Libraries for Code Generation

arXiv - AI · 4 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime