[2602.13476] AsyncVLA: An Asynchronous VLA for Fast and Robust Navigation on the Edge
Summary
AsyncVLA introduces an asynchronous control framework for robotic navigation, enhancing real-time performance by decoupling semantic reasoning from execution, achieving a 40% higher success rate in dynamic environments.
Why It Matters
This research addresses the critical challenge of high inference latency in robotic systems, which can compromise safety and effectiveness in real-time applications. By improving navigation capabilities, the findings could significantly advance robotics in dynamic settings, making them safer and more reliable.
Key Takeaways
- AsyncVLA enhances navigation by separating high-level reasoning from execution.
- The framework achieves a 40% higher success rate compared to existing methods.
- It effectively addresses latency issues in dynamic environments.
- Utilizes a finetuning protocol and trajectory re-weighting strategy.
- Demonstrates practical application in real-world vision-based navigation tasks.
Computer Science > Robotics arXiv:2602.13476 (cs) [Submitted on 13 Feb 2026] Title:AsyncVLA: An Asynchronous VLA for Fast and Robust Navigation on the Edge Authors:Noriaki Hirose, Catherine Glossop, Dhruv Shah, Sergey Levine View a PDF of the paper titled AsyncVLA: An Asynchronous VLA for Fast and Robust Navigation on the Edge, by Noriaki Hirose and 3 other authors View PDF HTML (experimental) Abstract:Robotic foundation models achieve strong generalization by leveraging internet-scale vision-language representations, but their massive computational cost creates a fundamental bottleneck: high inference latency. In dynamic environments, this latency breaks the control loop, rendering powerful models unsafe for real-time deployment. We propose AsyncVLA, an asynchronous control framework that decouples semantic reasoning from reactive execution. Inspired by hierarchical control, AsyncVLA runs a large foundation model on a remote workstation to provide high-level guidance, while a lightweight, onboard Edge Adapter continuously refines actions at high frequency. To bridge the domain gap between these asynchronous streams, we introduce an end-to-end finetuning protocol and a trajectory re-weighting strategy that prioritizes dynamic interactions. We evaluate our approach on real-world vision-based navigation tasks with communication delays up to 6 seconds. AsyncVLA achieves a 40% higher success rate than state-of-the-art baselines, effectively bridging the gap between the semanti...