[2603.01055] MMCOMET: A Large-Scale Multimodal Commonsense Knowledge Graph for Contextual Reasoning
About this article
Abstract page for arXiv paper 2603.01055: MMCOMET: A Large-Scale Multimodal Commonsense Knowledge Graph for Contextual Reasoning
Computer Science > Artificial Intelligence arXiv:2603.01055 (cs) [Submitted on 1 Mar 2026] Title:MMCOMET: A Large-Scale Multimodal Commonsense Knowledge Graph for Contextual Reasoning Authors:Eileen Wang, Hiba Arnaout, Dhita Pratama, Shuo Yang, Dangyang Liu, Jie Yang, Josiah Poon, Jeff Pan, Caren Han View a PDF of the paper titled MMCOMET: A Large-Scale Multimodal Commonsense Knowledge Graph for Contextual Reasoning, by Eileen Wang and 7 other authors View PDF HTML (experimental) Abstract:We present MMCOMET, the first multimodal commonsense knowledge graph (MMKG) that integrates physical, social, and eventive knowledge. MMCOMET extends the ATOMIC2020 knowledge graph to include a visual dimension, through an efficient image retrieval process, resulting in over 900K multimodal triples. This new resource addresses a major limitation of existing MMKGs in supporting complex reasoning tasks like image captioning and storytelling. Through a standard visual storytelling experiment, we show that our holistic approach enables the generation of richer, coherent, and contextually grounded stories than those produced using text-only knowledge. This resource establishes a new foundation for multimodal commonsense reasoning and narrative generation. Subjects: Artificial Intelligence (cs.AI) Cite as: arXiv:2603.01055 [cs.AI] (or arXiv:2603.01055v1 [cs.AI] for this version) https://doi.org/10.48550/arXiv.2603.01055 Focus to learn more arXiv-issued DOI via DataCite (pending registration...