[2510.11579] MS-Mix: Sentiment-Guided Adaptive Augmentation for Multimodal Sentiment Analysis
About this article
Abstract page for arXiv paper 2510.11579: MS-Mix: Sentiment-Guided Adaptive Augmentation for Multimodal Sentiment Analysis
Computer Science > Computer Vision and Pattern Recognition arXiv:2510.11579 (cs) [Submitted on 13 Oct 2025 (v1), last revised 2 Apr 2026 (this version, v3)] Title:MS-Mix: Sentiment-Guided Adaptive Augmentation for Multimodal Sentiment Analysis Authors:Hongyu Zhu, Lin Chen, Xin Jin, Mingsheng Shang View a PDF of the paper titled MS-Mix: Sentiment-Guided Adaptive Augmentation for Multimodal Sentiment Analysis, by Hongyu Zhu and 3 other authors View PDF HTML (experimental) Abstract:Multimodal Sentiment Analysis (MSA) integrates complementary features from text, video, and audio for robust emotion understanding in human interactions. However, models suffer from severe data scarcity and high annotation costs, severely limiting real-world deployment in social media analytics and human-computer systems. Existing Mixup-based augmentation techniques, when naively applied to MSA, often produce semantically inconsistent samples and amplified label noise by ignoring emotional semantics across modalities. To address these challenges, we propose MS-Mix, an adaptive emotion-sensitive augmentation framework that automatically optimizes data quality in multimodal settings. Its key components are: (1) Sentiment-aware sample selection strategy that filters incompatible pairs via latent-space semantic similarity to prevent contradictory emotion mixing. (2) Sentiment intensity guided module with multi-head self-attention for computing modality-specific mixing ratios conditioned on emotional sa...