[2602.10556] LAP: Language-Action Pre-Training Enables Zero-shot Cross-Embodiment Transfer

[2602.10556] LAP: Language-Action Pre-Training Enables Zero-shot Cross-Embodiment Transfer

arXiv - AI 4 min read Article

Summary

The paper introduces Language-Action Pre-Training (LAP), a method enabling robots to perform tasks across different embodiments without fine-tuning, achieving significant zero-shot transfer success.

Why It Matters

LAP addresses a critical challenge in robotics by allowing models to operate effectively on new robot embodiments without the need for extensive retraining. This advancement can lead to more adaptable and efficient robotic systems, enhancing their deployment in diverse environments.

Key Takeaways

  • LAP allows zero-shot transfer of policies to new robot embodiments.
  • The method requires no learned tokenizer or costly annotations.
  • LAP-3B achieves over 50% average zero-shot success, doubling previous performance.
  • It unifies action prediction and visual question answering in a shared format.
  • LAP enables efficient adaptation and scaling for robotic applications.

Computer Science > Robotics arXiv:2602.10556 (cs) [Submitted on 11 Feb 2026 (v1), last revised 15 Feb 2026 (this version, v2)] Title:LAP: Language-Action Pre-Training Enables Zero-shot Cross-Embodiment Transfer Authors:Lihan Zha, Asher J. Hancock, Mingtong Zhang, Tenny Yin, Yixuan Huang, Dhruv Shah, Allen Z. Ren, Anirudha Majumdar View a PDF of the paper titled LAP: Language-Action Pre-Training Enables Zero-shot Cross-Embodiment Transfer, by Lihan Zha and 7 other authors View PDF HTML (experimental) Abstract:A long-standing goal in robotics is a generalist policy that can be deployed zero-shot on new robot embodiments without per-embodiment adaptation. Despite large-scale multi-embodiment pre-training, existing Vision-Language-Action models (VLAs) remain tightly coupled to their training embodiments and typically require costly fine-tuning. We introduce Language-Action Pre-training (LAP), a simple recipe that represents low-level robot actions directly in natural language, aligning action supervision with the pre-trained vision-language model's input-output distribution. LAP requires no learned tokenizer, no costly annotation, and no embodiment-specific architectural design. Based on LAP, we present LAP-3B, which to the best of our knowledge is the first VLA to achieve substantial zero-shot transfer to previously unseen robot embodiments without any embodiment-specific fine-tuning. Across multiple novel robots and manipulation tasks, LAP-3B attains over 50% average zero-sh...

Related Articles

Yupp shuts down after raising $33M from a16z crypto's Chris Dixon | TechCrunch
Machine Learning

Yupp shuts down after raising $33M from a16z crypto's Chris Dixon | TechCrunch

Less than a year after launching, with checks from some of the biggest names in Silicon Valley, crowdsourced AI model feedback startup Yu...

TechCrunch - AI · 4 min ·
Machine Learning

[R] Fine-tuning services report

If you have some data and want to train or run a small custom model but don't have powerful enough hardware for training, fine-tuning ser...

Reddit - Machine Learning · 1 min ·
Machine Learning

[D] Does ML have a "bible"/reference textbook at the Intermediate/Advanced level?

Hello, everyone! This is my first time posting here and I apologise if the question is, perhaps, a bit too basic for this sub-reddit. A b...

Reddit - Machine Learning · 1 min ·
Machine Learning

[D] ICML 2026 review policy debate: 100 responses suggest Policy B may score higher, while Policy A shows higher confidence

A week ago I made a thread asking whether ICML 2026’s review policy might have affected review outcomes, especially whether Policy A pape...

Reddit - Machine Learning · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime