Machine Learning Nlp Ai Startups Robotics Ai Agents

[2602.12691] ALOE: Action-Level Off-Policy Evaluation for Vision-Language-Action Model Post-Training

arXiv - AI February 16, 2026 4 min read Article

Summary

The paper presents ALOE, an action-level off-policy evaluation framework aimed at enhancing vision-language-action models through reinforcement learning, demonstrating improved efficiency in real-world tasks.

Why It Matters

ALOE addresses the limitations of traditional on-policy evaluation methods in reinforcement learning, which can hinder the learning process of complex models. By allowing for off-policy evaluation, it enhances the training effectiveness of vision-language-action systems, which are increasingly relevant in robotics and AI applications.

Key Takeaways

ALOE improves learning efficiency for vision-language-action models.
The framework utilizes action-level evaluation to enhance credit assignment.
It supports stable policy improvement in real-world manipulation tasks.
ALOE demonstrates effectiveness across diverse tasks, including smartphone packing and laundry folding.
The approach reintroduces off-policy reinforcement learning reliably.

Computer Science > Robotics arXiv:2602.12691 (cs) [Submitted on 13 Feb 2026] Title:ALOE: Action-Level Off-Policy Evaluation for Vision-Language-Action Model Post-Training Authors:Rushuai Yang, Hecheng Wang, Chiming Liu, Xiaohan Yan, Yunlong Wang, Xuan Du, Shuoyu Yue, Yongcheng Liu, Chuheng Zhang, Lizhe Qi, Yi Chen, Wei Shan, Maoqing Yao View a PDF of the paper titled ALOE: Action-Level Off-Policy Evaluation for Vision-Language-Action Model Post-Training, by Rushuai Yang and 12 other authors View PDF HTML (experimental) Abstract:We study how to improve large foundation vision-language-action (VLA) systems through online reinforcement learning (RL) in real-world settings. Central to this process is the value function, which provides learning signals to guide VLA learning from experience. In practice, the value function is estimated from trajectory fragments collected from different data sources, including historical policies and intermittent human interventions. Estimating the value function of current behavior quality from the mixture data is inherently an off-policy evaluation problem. However, prior work often adopts conservative on-policy estimation for stability, which avoids direct evaluation of the current high-capacity policy and limits learning effectiveness. In this paper, we propose ALOE, an action-level off-policy evaluation framework for VLA post-training. ALOE applies chunking-based temporal-difference bootstrapping to evaluate individual action sequences inste...

Read Original Article

[2602.12691] ALOE: Action-Level Off-Policy Evaluation for Vision-Language-Action Model Post-Training

Summary

Why It Matters

Key Takeaways

Related Articles

Improving AI models’ ability to explain their predictions

[D] TMLR reviews seem more reliable than ICML/NeurIPS/ICLR

[D] icml, no rebuttal ack so far..

UMKC Announces New Master of Science in Artificial Intelligence

No comments

Stay updated with AI News