[2605.07474] ForgeVLA: Federated Vision-Language-Action Learning

[2605.07474] ForgeVLA: Federated Vision-Language-Action Learning without Language Annotations

arXiv - AI May 11, 2026 4 min read

About this article

Abstract page for arXiv paper 2605.07474: ForgeVLA: Federated Vision-Language-Action Learning without Language Annotations

Computer Science > Computer Vision and Pattern Recognition arXiv:2605.07474 (cs) [Submitted on 8 May 2026] Title:ForgeVLA: Federated Vision-Language-Action Learning without Language Annotations Authors:Yuhao Zhou, Yunpeng Zhu, Yang Zhou, Jindi Lyu, Jian Lan, Zhangyuan Wang, Dan Si, Thomas Seidl, Qing Ye, Jiancheng Lyu View a PDF of the paper titled ForgeVLA: Federated Vision-Language-Action Learning without Language Annotations, by Yuhao Zhou and 9 other authors View PDF HTML (experimental) Abstract:Vision-Language-Action (VLA) models hold great promise for general-purpose robotic intelligence, yet scaling up such models is severely bottlenecked by the high cost of acquiring annotated training data. Fortunately, vision-equipped robots deployed across various domains already produce abundant vision-action pairs that can be leveraged to scale up VLA training more efficiently. However, these raw data cannot be centrally aggregated due to various constraints and also exhibit severe heterogeneity. To address these challenges, in this paper, we propose ForgeVLA, a federated VLA training framework that learns VLA models from distributed vision-action pairs without centralizing raw data or requiring manual annotations. Specifically, each client in ForgeVLA is equipped with an embodied instruction classifier that maps vision-action pairs to a predefined instruction set, recovering the missing language modality and forming complete vision-language-action triplets. Beyond triplet con...

Originally published on May 11, 2026. Curated by AI News.

Machine Learning

What to expect from AlphaZero's value predictions [D]

An AlphaZero agent has learnt to predict the value of a game state by training on data generated by self-play by the model and a series o...

Reddit - Machine Learning · 1 min · about 2 hours ago

Machine Learning

Open Source Projects related to CNNs to Contribute To? [D]

Around a decade a go I was tinkering a lot with CNNs for real time event detection. I enjoyed that a lot and always wanted to get back in...

Reddit - Machine Learning · 1 min · about 2 hours ago

Machine Learning

I Work in Hollywood. Everyone Who Used to Make TV Is Now Secretly Training AI | WIRED

For screenwriters like me—and job seekers all over—AI gig work is the new waiting tables. In eight months, I’ve done 20 of these soul-cru...

Wired - AI · 27 min · about 3 hours ago

Machine Learning

Are Enterprises Using AI in the Wrong Places?

Most enterprise AI discussions still revolve around one question: But I’m starting to think that may be the wrong question entirely. The ...

Reddit - Artificial Intelligence · 1 min · about 3 hours ago

[2605.07474] ForgeVLA: Federated Vision-Language-Action Learning without Language Annotations

About this article

Related Articles

What to expect from AlphaZero's value predictions [D]

Open Source Projects related to CNNs to Contribute To? [D]

I Work in Hollywood. Everyone Who Used to Make TV Is Now Secretly Training AI | WIRED

Are Enterprises Using AI in the Wrong Places?

No comments

Stay updated with AI News