[2602.18019] DeepSVU: Towards In-depth Security-oriented Video Understanding via Unified Physical-world Regularized MoE
Summary
The paper introduces DeepSVU, a novel approach for Security-oriented Video Understanding that identifies threats and evaluates their causes using a Unified Physical-world Regularized MoE framework.
Why It Matters
DeepSVU addresses critical gaps in current video analysis by not only detecting threats but also understanding their underlying causes. This advancement is vital for enhancing security measures and improving the effectiveness of surveillance systems.
Key Takeaways
- DeepSVU aims to enhance video threat detection by evaluating threat causes.
- The Unified Physical-world Regularized MoE framework improves threat analysis.
- Experiments show DeepSVU outperforms existing Video-LLMs and non-VLM methods.
Computer Science > Computer Vision and Pattern Recognition arXiv:2602.18019 (cs) [Submitted on 20 Feb 2026] Title:DeepSVU: Towards In-depth Security-oriented Video Understanding via Unified Physical-world Regularized MoE Authors:Yujie Jin, Wenxin Zhang, Jingjing Wang, Guodong Zhou View a PDF of the paper titled DeepSVU: Towards In-depth Security-oriented Video Understanding via Unified Physical-world Regularized MoE, by Yujie Jin and 3 other authors View PDF HTML (experimental) Abstract:In the literature, prior research on Security-oriented Video Understanding (SVU) has predominantly focused on detecting and localize the threats (e.g., shootings, robberies) in videos, while largely lacking the effective capability to generate and evaluate the threat causes. Motivated by these gaps, this paper introduces a new chat paradigm SVU task, i.e., In-depth Security-oriented Video Understanding (DeepSVU), which aims to not only identify and locate the threats but also attribute and evaluate the causes threatening segments. Furthermore, this paper reveals two key challenges in the proposed task: 1) how to effectively model the coarse-to-fine physical-world information (e.g., human behavior, object interactions and background context) to boost the DeepSVU task; and 2) how to adaptively trade off these factors. To tackle these challenges, this paper proposes a new Unified Physical-world Regularized MoE (UPRM) approach. Specifically, UPRM incorporates two key components: the Unified Phy...