[2602.19710] Universal Pose Pretraining for Generalizable Vision-Language-Action Policies

[2602.19710] Universal Pose Pretraining for Generalizable Vision-Language-Action Policies

arXiv - Machine Learning 4 min read Article

Summary

The paper presents Pose-VLA, a novel framework for Vision-Language-Action (VLA) models that separates pre-training and post-training phases to enhance training efficiency and generalization in robotic actions.

Why It Matters

This research addresses critical limitations in existing VLA models, particularly their inefficiency and inability to generalize across diverse tasks. By introducing a structured pre-training approach, it offers a pathway to improve robotic performance and adaptability, which is essential for real-world applications in robotics and AI.

Key Takeaways

  • Pose-VLA decouples VLA training into pre-training and post-training phases for improved efficiency.
  • The framework uses discrete pose tokens for universal representation, enhancing spatial grounding.
  • Pose-VLA achieves state-of-the-art results on RoboTwin 2.0 and competitive performance on LIBERO.
  • Real-world experiments confirm robust generalization with minimal demonstrations per task.
  • The proposed method addresses feature collapse and low training efficiency in existing models.

Computer Science > Computer Vision and Pattern Recognition arXiv:2602.19710 (cs) [Submitted on 23 Feb 2026] Title:Universal Pose Pretraining for Generalizable Vision-Language-Action Policies Authors:Haitao Lin, Hanyang Yu, Jingshun Huang, He Zhang, Yonggen Ling, Ping Tan, Xiangyang Xue, Yanwei Fu View a PDF of the paper titled Universal Pose Pretraining for Generalizable Vision-Language-Action Policies, by Haitao Lin and 7 other authors View PDF HTML (experimental) Abstract:Existing Vision-Language-Action (VLA) models often suffer from feature collapse and low training efficiency because they entangle high-level perception with sparse, embodiment-specific action supervision. Since these models typically rely on VLM backbones optimized for Visual Question Answering (VQA), they excel at semantic identification but often overlook subtle 3D state variations that dictate distinct action patterns. To resolve these misalignments, we propose Pose-VLA, a decoupled paradigm that separates VLA training into a pre-training phase for extracting universal 3D spatial priors in a unified camera-centric space, and a post-training phase for efficient embodiment alignment within robot-specific action space. By introducing discrete pose tokens as a universal representation, Pose-VLA seamlessly integrates spatial grounding from diverse 3D datasets with geometry-level trajectories from robotic demonstrations. Our framework follows a two-stage pre-training pipeline, establishing fundamental spat...

Related Articles

Llms

CLI for Google AI Search (gai.google) — run AI-powered code/tech searches headlessly from your terminal

Google AI (gai.google) gives Gemini-powered answers for technical queries — think AI-enhanced search with code understanding. I built a C...

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

Big increase in the amount of people using AI to write their replies with AI

I find it interesting that we’ve all randomly decided to use the “-“ more often recently on reddit, and everyone’s grammar has drasticall...

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

[D] MXFP8 GEMM: Up to 99% of cuBLAS performance using CUDA + PTX

New blog post by Daniel Vega-Myhre (Meta/PyTorch) illustrating GEMM design for FP8, including deep-dives into all the constraints and des...

Reddit - Machine Learning · 1 min ·
IIT Delhi launches 8th batch of Advanced AI, ML, and DL online programme: Check who is eligible, applicat
Machine Learning

IIT Delhi launches 8th batch of Advanced AI, ML, and DL online programme: Check who is eligible, applicat

News News: The Continuing Education Programme (CEP) at IIT Delhi has announced the launch of the 8th batch of its Advanced Certificate Pr...

AI News - General · 9 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime