[2603.26772] From Content to Audience: A Multimodal Annotation Framework for Broadcast Television Analytics
About this article
Abstract page for arXiv paper 2603.26772: From Content to Audience: A Multimodal Annotation Framework for Broadcast Television Analytics
Computer Science > Computer Vision and Pattern Recognition arXiv:2603.26772 (cs) [Submitted on 24 Mar 2026] Title:From Content to Audience: A Multimodal Annotation Framework for Broadcast Television Analytics Authors:Paolo Cupini, Francesco Pierri View a PDF of the paper titled From Content to Audience: A Multimodal Annotation Framework for Broadcast Television Analytics, by Paolo Cupini and 1 other authors View PDF HTML (experimental) Abstract:Automated semantic annotation of broadcast television content presents distinctive challenges, combining structured audiovisual composition, domain-specific editorial patterns, and strict operational constraints. While multimodal large language models (MLLMs) have demonstrated strong general-purpose video understanding capabilities, their comparative effectiveness across pipeline architectures and input configurations in broadcast-specific settings remains empirically undercharacterized. This paper presents a systematic evaluation of multimodal annotation pipelines applied to broadcast television news in the Italian setting. We construct a domain-specific benchmark of clips labeled across four semantic dimensions: visual environment classification, topic classification, sensitive content detection, and named entity recognition. Two different pipeline architectures are evaluated across nine frontier models, including Gemini 3.0 Pro, LLaMA 4 Maverick, Qwen-VL variants, and Gemma 3, under progressively enriched input strategies combini...