[2602.13764] MOTIF: Learning Action Motifs for Few-shot Cross-Embodiment Transfer
Summary
The paper presents MOTIF, a novel framework for few-shot cross-embodiment transfer in robotics, addressing challenges in kinematic heterogeneity and data collection.
Why It Matters
MOTIF enhances the efficiency of robotic learning by enabling robots to adapt quickly to new embodiments with minimal data, which is crucial for advancing generalist robotic applications and reducing the reliance on extensive training datasets.
Key Takeaways
- MOTIF decouples action motifs from heterogeneous action data for better adaptability.
- The framework uses vector quantization and embodiment adversarial constraints to ensure consistency.
- MOTIF significantly outperforms existing methods in few-shot transfer scenarios, achieving up to 43.7% improvement in real-world settings.
Computer Science > Robotics arXiv:2602.13764 (cs) [Submitted on 14 Feb 2026] Title:MOTIF: Learning Action Motifs for Few-shot Cross-Embodiment Transfer Authors:Heng Zhi, Wentao Tan, Lei Zhu, Fengling Li, Jingjing Li, Guoli Yang, Heng Tao Shen View a PDF of the paper titled MOTIF: Learning Action Motifs for Few-shot Cross-Embodiment Transfer, by Heng Zhi and 6 other authors View PDF HTML (experimental) Abstract:While vision-language-action (VLA) models have advanced generalist robotic learning, cross-embodiment transfer remains challenging due to kinematic heterogeneity and the high cost of collecting sufficient real-world demonstrations to support fine-tuning. Existing cross-embodiment policies typically rely on shared-private architectures, which suffer from limited capacity of private parameters and lack explicit adaptation mechanisms. To address these limitations, we introduce MOTIF for efficient few-shot cross-embodiment transfer that decouples embodiment-agnostic spatiotemporal patterns, termed action motifs, from heterogeneous action data. Specifically, MOTIF first learns unified motifs via vector quantization with progress-aware alignment and embodiment adversarial constraints to ensure temporal and cross-embodiment consistency. We then design a lightweight predictor that predicts these motifs from real-time inputs to guide a flow-matching policy, fusing them with robot-specific states to enable action generation on new embodiments. Evaluations across both simulatio...