[2601.01095] NarrativeTrack: Evaluating Entity-Centric Reasoning for

[2601.01095] NarrativeTrack: Evaluating Entity-Centric Reasoning for Narrative Understanding

arXiv - Machine Learning March 31, 2026 4 min read

About this article

Abstract page for arXiv paper 2601.01095: NarrativeTrack: Evaluating Entity-Centric Reasoning for Narrative Understanding

Computer Science > Computer Vision and Pattern Recognition arXiv:2601.01095 (cs) [Submitted on 3 Jan 2026 (v1), last revised 28 Mar 2026 (this version, v2)] Title:NarrativeTrack: Evaluating Entity-Centric Reasoning for Narrative Understanding Authors:Hyeonjeong Ha, Jinjin Ge, Bo Feng, Kaixin Ma, Gargi Chakraborty View a PDF of the paper titled NarrativeTrack: Evaluating Entity-Centric Reasoning for Narrative Understanding, by Hyeonjeong Ha and 4 other authors View PDF HTML (experimental) Abstract:Multimodal large language models (MLLMs) have achieved impressive progress in vision-language reasoning, yet their ability to understand temporally unfolding narratives in videos remains underexplored. True narrative understanding requires grounding who is doing what, when, and where, maintaining coherent entity representations across dynamic visual and temporal contexts. We introduce NarrativeTrack, the first benchmark to evaluate narrative understanding in MLLMs through fine-grained entity-centric reasoning. Unlike existing benchmarks limited to short clips or coarse scene-level semantics, we decompose videos into constituent entities and examine their continuity via a Compositional Reasoning Progression (CRP), a structured evaluation framework that progressively increases narrative complexity across three dimensions: entity existence, entity changes, and entity ambiguity. CRP challenges models to advance from temporal persistence to contextual evolution and fine-grained percept...

Originally published on March 31, 2026. Curated by AI News.

Llms

[R] Reference model free behavioral discovery of AudiBench model organisms via Probe-Mediated Adaptive Auditing

Anthropic's AuditBench - 56 Llama 3.3 70B models with planted hidden behaviors - their best agent detects the behaviros 10-13% of the tim...

Reddit - Machine Learning · 1 min · about 1 hour ago

Llms

[P] Dante-2B: I'm training a 2.1B bilingual fully open Italian/English LLM from scratch on 2×H200. Phase 1 done — here's what I've built.

The problem If you work with Italian text and local models, you know the pain. Every open-source LLM out there treats Italian as an after...

Reddit - Machine Learning · 1 min · about 2 hours ago

Llms

I have been coding for 11 years and I caught myself completely unable to debug a problem without AI assistance last month. That scared me more than anything I have seen in this industry.

I want to be honest about something that happened to me because I think it is more common than people admit. Last month I hit a bug in a ...

Reddit - Artificial Intelligence · 1 min · about 3 hours ago

Llms

OpenClaw security checklist: practical safeguards for AI agents

Here is one of the better quality guides on the ensuring safety when deploying OpenClaw: https://chatgptguide.ai/openclaw-security-checkl...

Reddit - Artificial Intelligence · 1 min · about 9 hours ago

[2601.01095] NarrativeTrack: Evaluating Entity-Centric Reasoning for Narrative Understanding

About this article

Related Articles

[R] Reference model free behavioral discovery of AudiBench model organisms via Probe-Mediated Adaptive Auditing

[P] Dante-2B: I'm training a 2.1B bilingual fully open Italian/English LLM from scratch on 2×H200. Phase 1 done — here's what I've built.

I have been coding for 11 years and I caught myself completely unable to debug a problem without AI assistance last month. That scared me more than anything I have seen in this industry.

OpenClaw security checklist: practical safeguards for AI agents

No comments

Stay updated with AI News