[2602.12684] Xiaomi-Robotics-0: An Open-Sourced Vision-Language-Action Model with Real-Time Execution

[2602.12684] Xiaomi-Robotics-0: An Open-Sourced Vision-Language-Action Model with Real-Time Execution

arXiv - Machine Learning 4 min read Article

Summary

Xiaomi-Robotics-0 is an advanced open-sourced vision-language-action model designed for real-time execution, showcasing state-of-the-art performance in robotic tasks.

Why It Matters

This research introduces a significant advancement in robotics by optimizing a vision-language-action model for real-time applications, which can enhance the capabilities of robots in various tasks. The open-sourcing of the model promotes further research and development in the field, potentially leading to more innovative applications in robotics and AI.

Key Takeaways

  • Xiaomi-Robotics-0 achieves high performance in real-time robotic tasks.
  • The model is pre-trained on extensive data, enhancing its action-generation capabilities.
  • Innovative techniques were developed to reduce inference latency during real-robot rollouts.
  • The model has been validated in both simulation and real-world scenarios, showing superior success rates.
  • Code and model checkpoints are available for public use, fostering community research.

Computer Science > Robotics arXiv:2602.12684 (cs) [Submitted on 13 Feb 2026] Title:Xiaomi-Robotics-0: An Open-Sourced Vision-Language-Action Model with Real-Time Execution Authors:Rui Cai, Jun Guo, Xinze He, Piaopiao Jin, Jie Li, Bingxuan Lin, Futeng Liu, Wei Liu, Fei Ma, Kun Ma, Feng Qiu, Heng Qu, Yifei Su, Qiao Sun, Dong Wang, Donghao Wang, Yunhong Wang, Rujie Wu, Diyun Xiang, Yu Yang, Hangjun Ye, Yuan Zhang, Quanyun Zhou View a PDF of the paper titled Xiaomi-Robotics-0: An Open-Sourced Vision-Language-Action Model with Real-Time Execution, by Rui Cai and 22 other authors View PDF HTML (experimental) Abstract:In this report, we introduce Xiaomi-Robotics-0, an advanced vision-language-action (VLA) model optimized for high performance and fast and smooth real-time execution. The key to our method lies in a carefully designed training recipe and deployment strategy. Xiaomi-Robotics-0 is first pre-trained on large-scale cross-embodiment robot trajectories and vision-language data, endowing it with broad and generalizable action-generation capabilities while avoiding catastrophic forgetting of the visual-semantic knowledge of the underlying pre-trained VLM. During post-training, we propose several techniques for training the VLA model for asynchronous execution to address the inference latency during real-robot rollouts. During deployment, we carefully align the timesteps of consecutive predicted action chunks to ensure continuous and seamless real-time rollouts. We evaluate ...

Related Articles

Llms

[R] BraiNN: An Experimental Neural Architecture with Working Memory, Relational Reasoning, and Adaptive Learning

BraiNN An Experimental Neural Architecture with Working Memory, Relational Reasoning, and Adaptive Learning BraiNN is a compact research‑...

Reddit - Machine Learning · 1 min ·
Machine Learning

[HIRING]Remote AI Training Jobs -Up to $1K/Week| Collaborators Wanted.USA

submitted by /u/nortonakenga [link] [comments]

Reddit - ML Jobs · 1 min ·
Machine Learning

VulcanAMI Might Help

I open-sourced a large AI platform I built solo, working 16 hours a day, at my kitchen table, fueled by an inordinate degree of compulsio...

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

[P] I tested Meta’s brain-response model on posts. It predicted the Elon one almost perfectly.

I built an experimental UI and visualization layer around Meta’s open brain-response model just to see whether this stuff actually works ...

Reddit - Machine Learning · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime