[2602.21633] Self-Correcting VLA: Online Action Refinement via Sparse World Imagination
Summary
The paper presents Self-Correcting VLA, a novel approach in robotics that enhances vision-language-action models by integrating sparse world imagination for improved action refinement and task performance.
Why It Matters
This research addresses limitations in current vision-language-action models by introducing self-correcting mechanisms that enhance predictive planning and physical grounding. The findings contribute to advancements in robotics, particularly in improving task efficiency and success rates in real-world applications.
Key Takeaways
- Self-Correcting VLA integrates sparse world imagination for action refinement.
- The approach enhances task throughput by 16% and success rates by 9%.
- It addresses the limitations of existing VLA models reliant on statistical data priors.
- The method includes online action refinement to adjust trajectory based on predicted states.
- Real-world experiments validate the effectiveness of the proposed model.
Computer Science > Robotics arXiv:2602.21633 (cs) [Submitted on 25 Feb 2026] Title:Self-Correcting VLA: Online Action Refinement via Sparse World Imagination Authors:Chenyv Liu, Wentao Tan, Lei Zhu, Fengling Li, Jingjing Li, Guoli Yang, Heng Tao Shen View a PDF of the paper titled Self-Correcting VLA: Online Action Refinement via Sparse World Imagination, by Chenyv Liu and 6 other authors View PDF HTML (experimental) Abstract:Standard vision-language-action (VLA) models rely on fitting statistical data priors, limiting their robust understanding of underlying physical dynamics. Reinforcement learning enhances physical grounding through exploration yet typically relies on external reward signals that remain isolated from the agent's internal states. World action models have emerged as a promising paradigm that integrates imagination and control to enable predictive planning. However, they rely on implicit context modeling, lacking explicit mechanisms for self-improvement. To solve these problems, we propose Self-Correcting VLA (SC-VLA), which achieve self-improvement by intrinsically guiding action refinement through sparse imagination. We first design sparse world imagination by integrating auxiliary predictive heads to forecast current task progress and future trajectory trends, thereby constraining the policy to encode short-term physical evolution. Then we introduce the online action refinement module to reshape progress-dependent dense rewards, adjusting trajectory ori...