[2507.15852] Advancing Complex Video Object Segmentation via Progressive Concept Construction
About this article
Abstract page for arXiv paper 2507.15852: Advancing Complex Video Object Segmentation via Progressive Concept Construction
Computer Science > Computer Vision and Pattern Recognition arXiv:2507.15852 (cs) [Submitted on 21 Jul 2025 (v1), last revised 28 Feb 2026 (this version, v3)] Title:Advancing Complex Video Object Segmentation via Progressive Concept Construction Authors:Zhixiong Zhang, Shuangrui Ding, Xiaoyi Dong, Songxin He, Jianfan Lin, Junsong Tang, Yuhang Zang, Yuhang Cao, Dahua Lin, Jiaqi Wang View a PDF of the paper titled Advancing Complex Video Object Segmentation via Progressive Concept Construction, by Zhixiong Zhang and 9 other authors View PDF HTML (experimental) Abstract:We propose Segment Concept (SeC), a concept-driven video object segmentation (VOS) framework that shifts from conventional feature matching to the progressive construction and utilization of high-level, object-centric representations. SeC employs Large Vision-Language Models (LVLMs) to integrate visual cues across diverse frames, constructing robust conceptual priors. To balance semantic reasoning with computational overhead, SeC forwards the LVLMs only when a new scene appears, injecting concept-level features at those points. To rigorously assess VOS methods in scenarios demanding high-level conceptual reasoning and robust semantic understanding, we introduce the Semantic Complex Scenarios Video Object Segmentation benchmark (SeCVOS). SeCVOS comprises 160 manually annotated multi-scenario videos designed to challenge models with substantial appearance variations and dynamic scene transformations. Empirical ev...