[2603.21647] FedCVU: Federated Learning for Cross-View Video Understanding
About this article
Abstract page for arXiv paper 2603.21647: FedCVU: Federated Learning for Cross-View Video Understanding
Computer Science > Computer Vision and Pattern Recognition arXiv:2603.21647 (cs) [Submitted on 23 Mar 2026] Title:FedCVU: Federated Learning for Cross-View Video Understanding Authors:Shenghan Zhang, Run Ling, Ke Cao, Ao Ma, Zhanjie Zhang View a PDF of the paper titled FedCVU: Federated Learning for Cross-View Video Understanding, by Shenghan Zhang and 4 other authors View PDF HTML (experimental) Abstract:Federated learning (FL) has emerged as a promising paradigm for privacy-preserving multi-camera video understanding. However, applying FL to cross-view scenarios faces three major challenges: (i) heterogeneous viewpoints and backgrounds lead to highly non-IID client distributions and overfitting to view-specific patterns, (ii) local distribution biases cause misaligned representations that hinder consistent cross-view semantics, and (iii) large video architectures incur prohibitive communication overhead. To address these issues, we propose FedCVU, a federated framework with three components: VS-Norm, which preserves normalization parameters to handle view-specific statistics; CV-Align, a lightweight contrastive regularization module to improve cross-view representation alignment; and SLA, a selective layer aggregation strategy that reduces communication without sacrificing accuracy. Extensive experiments on action understanding and person re-identification tasks under a cross-view protocol demonstrate that FedCVU consistently boosts unseen-view accuracy while maintaining...