[2602.15842] Memes-as-Replies: Can Models Select Humorous Manga Panel Responses?
Summary
This article explores the Meme Reply Selection task, analyzing how large language models (LLMs) can select humorous manga panel responses, revealing insights into their understanding of social cues and humor.
Why It Matters
Understanding how models can interpret and generate humor is crucial as it reflects advancements in AI's ability to engage in human-like interactions. This research highlights the limitations of current models in grasping subtle humor, which is essential for improving AI communication in social contexts.
Key Takeaways
- LLMs show potential in capturing complex social cues beyond semantic matching.
- Visual information does not enhance models' performance in humor selection.
- Current models struggle with subtle differences in humor among similar responses.
- The study introduces a new benchmark for evaluating meme reply selection.
- Selecting contextually humorous replies remains a challenge for AI.
Computer Science > Machine Learning arXiv:2602.15842 (cs) [Submitted on 21 Jan 2026] Title:Memes-as-Replies: Can Models Select Humorous Manga Panel Responses? Authors:Ryosuke Kohita, Seiichiro Yoshioka View a PDF of the paper titled Memes-as-Replies: Can Models Select Humorous Manga Panel Responses?, by Ryosuke Kohita and 1 other authors View PDF HTML (experimental) Abstract:Memes are a popular element of modern web communication, used not only as static artifacts but also as interactive replies within conversations. While computational research has focused on analyzing the intrinsic properties of memes, the dynamic and contextual use of memes to create humor remains an understudied area of web science. To address this gap, we introduce the Meme Reply Selection task and present MaMe-Re (Manga Meme Reply Benchmark), a benchmark of 100,000 human-annotated pairs (500,000 total annotations from 2,325 unique annotators) consisting of openly licensed Japanese manga panels and social media posts. Our analysis reveals three key insights: (1) large language models (LLMs) show preliminary evidence of capturing complex social cues such as exaggeration, moving beyond surface-level semantic matching; (2) the inclusion of visual information does not improve performance, revealing a gap between understanding visual content and effectively using it for contextual humor; (3) while LLMs can match human judgments in controlled settings, they struggle to distinguish subtle differences in wit ...