Computer Vision Ai Safety Data Science

[2602.18019] DeepSVU: Towards In-depth Security-oriented Video Understanding via Unified Physical-world Regularized MoE

arXiv - AI February 23, 2026 4 min read Article

Summary

The paper introduces DeepSVU, a novel approach for Security-oriented Video Understanding that identifies threats and evaluates their causes using a Unified Physical-world Regularized MoE framework.

Why It Matters

DeepSVU addresses critical gaps in current video analysis by not only detecting threats but also understanding their underlying causes. This advancement is vital for enhancing security measures and improving the effectiveness of surveillance systems.

Key Takeaways

DeepSVU aims to enhance video threat detection by evaluating threat causes.
The Unified Physical-world Regularized MoE framework improves threat analysis.
Experiments show DeepSVU outperforms existing Video-LLMs and non-VLM methods.

Computer Science > Computer Vision and Pattern Recognition arXiv:2602.18019 (cs) [Submitted on 20 Feb 2026] Title:DeepSVU: Towards In-depth Security-oriented Video Understanding via Unified Physical-world Regularized MoE Authors:Yujie Jin, Wenxin Zhang, Jingjing Wang, Guodong Zhou View a PDF of the paper titled DeepSVU: Towards In-depth Security-oriented Video Understanding via Unified Physical-world Regularized MoE, by Yujie Jin and 3 other authors View PDF HTML (experimental) Abstract:In the literature, prior research on Security-oriented Video Understanding (SVU) has predominantly focused on detecting and localize the threats (e.g., shootings, robberies) in videos, while largely lacking the effective capability to generate and evaluate the threat causes. Motivated by these gaps, this paper introduces a new chat paradigm SVU task, i.e., In-depth Security-oriented Video Understanding (DeepSVU), which aims to not only identify and locate the threats but also attribute and evaluate the causes threatening segments. Furthermore, this paper reveals two key challenges in the proposed task: 1) how to effectively model the coarse-to-fine physical-world information (e.g., human behavior, object interactions and background context) to boost the DeepSVU task; and 2) how to adaptively trade off these factors. To tackle these challenges, this paper proposes a new Unified Physical-world Regularized MoE (UPRM) approach. Specifically, UPRM incorporates two key components: the Unified Phy...

Read Original Article

Computer Vision

[2602.09678] Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap

Abstract page for arXiv paper 2602.09678: Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap

arXiv - AI · 4 min · about 15 hours ago

Llms

[2601.13622] CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language Models

Abstract page for arXiv paper 2601.13622: CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language...

arXiv - AI · 3 min · about 15 hours ago

Computer Vision

[2603.26551] Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones

Abstract page for arXiv paper 2603.26551: Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones

arXiv - AI · 4 min · about 15 hours ago

Llms

[2603.26292] findsylls: A Language-Agnostic Toolkit for Syllable-Level Speech Tokenization and Embedding

Abstract page for arXiv paper 2603.26292: findsylls: A Language-Agnostic Toolkit for Syllable-Level Speech Tokenization and Embedding