[2411.15455] M3TR: Temporal Retrieval Enhanced Multi-Modal Micro-video Popularity Prediction
About this article
Abstract page for arXiv paper 2411.15455: M3TR: Temporal Retrieval Enhanced Multi-Modal Micro-video Popularity Prediction
Computer Science > Multimedia arXiv:2411.15455 (cs) [Submitted on 23 Nov 2024 (v1), last revised 26 Feb 2026 (this version, v2)] Title:M3TR: Temporal Retrieval Enhanced Multi-Modal Micro-video Popularity Prediction Authors:Jiacheng Lu, Weijian Wang, Mingyuan Xiao, Yang Hua, Tao Song, Jiaru Zhang, Bo Peng, Cheng Hua, Haibing Guan View a PDF of the paper titled M3TR: Temporal Retrieval Enhanced Multi-Modal Micro-video Popularity Prediction, by Jiacheng Lu and 8 other authors View PDF HTML (experimental) Abstract:Accurately predicting the popularity of micro-videos is a critical but challenging task, characterized by volatile, `rollercoaster-like' engagement dynamics. Existing methods often fail to capture these complex temporal patterns, leading to inaccurate long-term forecasts. This failure stems from two fundamental limitations: \ding{172} a superficial understanding of user feedback dynamics, which overlooks the mutually exciting and decaying nature of interactions such as likes, comments, and shares; and~\ding{173} retrieval mechanisms that rely solely on static content similarity, ignoring the crucial patterns of how a video's popularity evolves over time. To address these limitations, we propose \textbf{M$^3$TR}, a \textbf{T}emporal \textbf{R}etrieval enhanced \textbf{M}ulti-\textbf{M}odal framework that uniquely synergizes fine-grained temporal modeling with a novel temporal-aware retrieval process for \textbf{M}icro-video popularity prediction. At its core, M$^3$TR ...