[2604.03552] CRAFT: Video Diffusion for Bimanual Robot Data Generation

[2604.03552] CRAFT: Video Diffusion for Bimanual Robot Data Generation

arXiv - AI 4 min read

About this article

Abstract page for arXiv paper 2604.03552: CRAFT: Video Diffusion for Bimanual Robot Data Generation

Computer Science > Robotics arXiv:2604.03552 (cs) [Submitted on 4 Apr 2026] Title:CRAFT: Video Diffusion for Bimanual Robot Data Generation Authors:Jason Chen, I-Chun Arthur Liu, Gaurav Sukhatme, Daniel Seita View a PDF of the paper titled CRAFT: Video Diffusion for Bimanual Robot Data Generation, by Jason Chen and 3 other authors View PDF HTML (experimental) Abstract:Bimanual robot learning from demonstrations is fundamentally limited by the cost and narrow visual diversity of real-world data, which constrains policy robustness across viewpoints, object configurations, and embodiments. We present Canny-guided Robot Data Generation using Video Diffusion Transformers (CRAFT), a video diffusion-based framework for scalable bimanual demonstration generation that synthesizes temporally coherent manipulation videos while producing action labels. By conditioning video diffusion on edge-based structural cues extracted from simulator-generated trajectories, CRAFT produces physically plausible trajectory variations and supports a unified augmentation pipeline spanning object pose changes, camera viewpoints, lighting and background variations, cross-embodiment transfer, and multi-view synthesis. We leverage a pre-trained video diffusion model to convert simulated videos, along with action labels from the simulation trajectories, into action-consistent demonstrations. Starting from only a few real-world demonstrations, CRAFT generates a large, visually diverse set of photorealistic t...

Originally published on April 07, 2026. Curated by AI News.

Related Articles

[2603.12365] Optimal Experimental Design for Reliable Learning of History-Dependent Constitutive Laws
Machine Learning

[2603.12365] Optimal Experimental Design for Reliable Learning of History-Dependent Constitutive Laws

Abstract page for arXiv paper 2603.12365: Optimal Experimental Design for Reliable Learning of History-Dependent Constitutive Laws

arXiv - Machine Learning · 4 min ·
[2603.17573] HeiSD: Hybrid Speculative Decoding for Embodied Vision-Language-Action Models with Kinematic Awareness
Machine Learning

[2603.17573] HeiSD: Hybrid Speculative Decoding for Embodied Vision-Language-Action Models with Kinematic Awareness

Abstract page for arXiv paper 2603.17573: HeiSD: Hybrid Speculative Decoding for Embodied Vision-Language-Action Models with Kinematic Aw...

arXiv - Machine Learning · 4 min ·
[2512.20562] Shallow Neural Networks Learn Low-Degree Spherical Polynomials with Feature Learning by Learnable Channel Attention
Machine Learning

[2512.20562] Shallow Neural Networks Learn Low-Degree Spherical Polynomials with Feature Learning by Learnable Channel Attention

Abstract page for arXiv paper 2512.20562: Shallow Neural Networks Learn Low-Degree Spherical Polynomials with Feature Learning by Learnab...

arXiv - Machine Learning · 4 min ·
[2603.07475] A Comparative analysis of Layer-wise Representational Capacity in AR and Diffusion LLMs
Llms

[2603.07475] A Comparative analysis of Layer-wise Representational Capacity in AR and Diffusion LLMs

Abstract page for arXiv paper 2603.07475: A Comparative analysis of Layer-wise Representational Capacity in AR and Diffusion LLMs

arXiv - Machine Learning · 3 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime