[2601.01095] NarrativeTrack: Evaluating Entity-Centric Reasoning for Narrative Understanding
About this article
Abstract page for arXiv paper 2601.01095: NarrativeTrack: Evaluating Entity-Centric Reasoning for Narrative Understanding
Computer Science > Computer Vision and Pattern Recognition arXiv:2601.01095 (cs) [Submitted on 3 Jan 2026 (v1), last revised 28 Mar 2026 (this version, v2)] Title:NarrativeTrack: Evaluating Entity-Centric Reasoning for Narrative Understanding Authors:Hyeonjeong Ha, Jinjin Ge, Bo Feng, Kaixin Ma, Gargi Chakraborty View a PDF of the paper titled NarrativeTrack: Evaluating Entity-Centric Reasoning for Narrative Understanding, by Hyeonjeong Ha and 4 other authors View PDF HTML (experimental) Abstract:Multimodal large language models (MLLMs) have achieved impressive progress in vision-language reasoning, yet their ability to understand temporally unfolding narratives in videos remains underexplored. True narrative understanding requires grounding who is doing what, when, and where, maintaining coherent entity representations across dynamic visual and temporal contexts. We introduce NarrativeTrack, the first benchmark to evaluate narrative understanding in MLLMs through fine-grained entity-centric reasoning. Unlike existing benchmarks limited to short clips or coarse scene-level semantics, we decompose videos into constituent entities and examine their continuity via a Compositional Reasoning Progression (CRP), a structured evaluation framework that progressively increases narrative complexity across three dimensions: entity existence, entity changes, and entity ambiguity. CRP challenges models to advance from temporal persistence to contextual evolution and fine-grained percept...