[2510.05761] Early Multimodal Prediction of Cross-Lingual Meme Virality on Reddit: A Time-Window Analysis
Summary
This article presents a novel approach to predicting the virality of memes on Reddit using a multimodal dataset and advanced machine learning techniques, highlighting the importance of network and temporal features in early predictions.
Why It Matters
Understanding meme virality is crucial for marketers and researchers in social media dynamics. This study provides a framework that improves prediction accuracy, which can enhance content strategy and engagement metrics across different languages and communities.
Key Takeaways
- A large-scale dataset of 46,578 memes was analyzed to predict virality.
- A Hybrid Score was developed to normalize engagement metrics across communities.
- Multimodal features including visual, textual, and network signals significantly improve prediction accuracy.
- Early predictions rely heavily on network context, shifting to temporal dynamics as engagement grows.
- The study challenges traditional static models by framing virality as a dynamic process.
Computer Science > Artificial Intelligence arXiv:2510.05761 (cs) [Submitted on 7 Oct 2025 (v1), last revised 21 Feb 2026 (this version, v2)] Title:Early Multimodal Prediction of Cross-Lingual Meme Virality on Reddit: A Time-Window Analysis Authors:Sedat Dogan, Nina Dethlefs, Debarati Chakraborty View a PDF of the paper titled Early Multimodal Prediction of Cross-Lingual Meme Virality on Reddit: A Time-Window Analysis, by Sedat Dogan and 2 other authors View PDF HTML (experimental) Abstract:Memes are a central part of online culture, yet their virality remains difficult to predict, especially in cross-lingual settings. We present a large-scale, time-series dataset of 46,578 Reddit memes collected from 25 meme-centric subreddits across eight language groups, with more than one million engagement tracking points. We propose a data-driven definition of virality based on a Hybrid Score that normalises engagement by community size and integrates dynamic features such as velocity and acceleration. This approach directly addresses the field's reliance on static, simple volume-based thresholds with arbitrary cut-offs. Building on this target, we construct a multimodal feature set that combines Visual, Textual, Contextual, Network, and Temporal signals, including structured annotations from a multimodal LLM to scale cross-lingual content labelling in a consistent way. We benchmark interpretable baselines (XGBoost, MLP) against end-to-end deep models (BERT, InceptionV3, CLIP) across ...