[2602.14401] pFedNavi: Structure-Aware Personalized Federated Vision-Language Navigation for Embodied AI
Summary
The paper presents pFedNavi, a personalized federated learning framework for Vision-Language Navigation (VLN) that addresses privacy concerns and improves navigation success rates through adaptive client-specific model adjustments.
Why It Matters
As VLN applications grow, privacy and data heterogeneity pose significant challenges. pFedNavi offers a novel solution that enhances performance while maintaining user privacy, making it relevant for developers and researchers in AI and robotics.
Key Takeaways
- pFedNavi personalizes federated learning by identifying client-specific layers.
- The framework outperforms traditional FedAvg methods in navigation tasks.
- Improvements include up to 7.5% in navigation success and faster convergence rates.
Computer Science > Computer Vision and Pattern Recognition arXiv:2602.14401 (cs) [Submitted on 16 Feb 2026] Title:pFedNavi: Structure-Aware Personalized Federated Vision-Language Navigation for Embodied AI Authors:Qingqian Yang, Hao Wang, Sai Qian Zhang, Jian Li, Yang Hua, Miao Pan, Tao Song, Zhengwei Qi, Haibing Guan View a PDF of the paper titled pFedNavi: Structure-Aware Personalized Federated Vision-Language Navigation for Embodied AI, by Qingqian Yang and 8 other authors View PDF HTML (experimental) Abstract:Vision-Language Navigation VLN requires large-scale trajectory instruction data from private indoor environments, raising significant privacy concerns. Federated Learning FL mitigates this by keeping data on-device, but vanilla FL struggles under VLNs' extreme cross-client heterogeneity in environments and instruction styles, making a single global model suboptimal. This paper proposes pFedNavi, a structure-aware and dynamically adaptive personalized federated learning framework tailored for VLN. Our key idea is to personalize where it matters: pFedNavi adaptively identifies client-specific layers via layer-wise mixing coefficients, and performs fine-grained parameter fusion on the selected components (e.g., the encoder-decoder projection and environment-sensitive decoder layers) to balance global knowledge sharing with local specialization. We evaluate pFedNavi on two standard VLN benchmarks, R2R and RxR, using both ResNet and CLIP visual representations. Across ...