[2602.12978] Learning Native Continuation for Action Chunking Flow Policies
Summary
This paper presents Legato, a novel training-time continuation method for action chunking in Vision Language Action models, enhancing trajectory smoothness and task completion efficiency.
Why It Matters
The development of Legato addresses critical challenges in real-time action execution for robotics, improving the performance of Vision Language Action models. This advancement is significant for applications requiring smooth and efficient task execution, such as robotic manipulation and autonomous systems.
Key Takeaways
- Legato improves action chunking by ensuring smoother trajectories.
- The method reduces spurious multimodal switching during execution.
- Empirical results show a 10% improvement in task completion time.
- Legato adapts to varying inference delays through randomized scheduling.
- The approach enhances the overall performance of Vision Language Action models.
Computer Science > Robotics arXiv:2602.12978 (cs) [Submitted on 13 Feb 2026] Title:Learning Native Continuation for Action Chunking Flow Policies Authors:Yufeng Liu, Hang Yu, Juntu Zhao, Bocheng Li, Di Zhang, Mingzhu Li, Wenxuan Wu, Yingdong Hu, Junyuan Xie, Junliang Guo, Dequan Wang, Yang Gao View a PDF of the paper titled Learning Native Continuation for Action Chunking Flow Policies, by Yufeng Liu and 11 other authors View PDF HTML (experimental) Abstract:Action chunking enables Vision Language Action (VLA) models to run in real time, but naive chunked execution often exhibits discontinuities at chunk boundaries. Real-Time Chunking (RTC) alleviates this issue but is external to the policy, leading to spurious multimodal switching and trajectories that are not intrinsically smooth. We propose Legato, a training-time continuation method for action-chunked flow-based VLA policies. Specifically, Legato initializes denoising from a schedule-shaped mixture of known actions and noise, exposing the model to partial action information. Moreover, Legato reshapes the learned flow dynamics to ensure that the denoising process remains consistent between training and inference under per-step guidance. Legato further uses randomized schedule condition during training to support varying inference delays and achieve controllable smoothness. Empirically, Legato produces smoother trajectories and reduces spurious multimodal switching during execution, leading to less hesitation and shorte...