[2604.03552] CRAFT: Video Diffusion for Bimanual Robot Data Generation

arXiv - AI April 07, 2026 4 min read

About this article

Abstract page for arXiv paper 2604.03552: CRAFT: Video Diffusion for Bimanual Robot Data Generation

Computer Science > Robotics arXiv:2604.03552 (cs) [Submitted on 4 Apr 2026] Title:CRAFT: Video Diffusion for Bimanual Robot Data Generation Authors:Jason Chen, I-Chun Arthur Liu, Gaurav Sukhatme, Daniel Seita View a PDF of the paper titled CRAFT: Video Diffusion for Bimanual Robot Data Generation, by Jason Chen and 3 other authors View PDF HTML (experimental) Abstract:Bimanual robot learning from demonstrations is fundamentally limited by the cost and narrow visual diversity of real-world data, which constrains policy robustness across viewpoints, object configurations, and embodiments. We present Canny-guided Robot Data Generation using Video Diffusion Transformers (CRAFT), a video diffusion-based framework for scalable bimanual demonstration generation that synthesizes temporally coherent manipulation videos while producing action labels. By conditioning video diffusion on edge-based structural cues extracted from simulator-generated trajectories, CRAFT produces physically plausible trajectory variations and supports a unified augmentation pipeline spanning object pose changes, camera viewpoints, lighting and background variations, cross-embodiment transfer, and multi-view synthesis. We leverage a pre-trained video diffusion model to convert simulated videos, along with action labels from the simulation trajectories, into action-consistent demonstrations. Starting from only a few real-world demonstrations, CRAFT generates a large, visually diverse set of photorealistic t...

Originally published on April 07, 2026. Curated by AI News.

Machine Learning

[2603.12365] Optimal Experimental Design for Reliable Learning of History-Dependent Constitutive Laws

Abstract page for arXiv paper 2603.12365: Optimal Experimental Design for Reliable Learning of History-Dependent Constitutive Laws

arXiv - Machine Learning · 4 min · 10 minutes ago

Machine Learning

[2603.17573] HeiSD: Hybrid Speculative Decoding for Embodied Vision-Language-Action Models with Kinematic Awareness

Abstract page for arXiv paper 2603.17573: HeiSD: Hybrid Speculative Decoding for Embodied Vision-Language-Action Models with Kinematic Aw...

arXiv - Machine Learning · 4 min · 10 minutes ago

Machine Learning

[2512.20562] Shallow Neural Networks Learn Low-Degree Spherical Polynomials with Feature Learning by Learnable Channel Attention

Abstract page for arXiv paper 2512.20562: Shallow Neural Networks Learn Low-Degree Spherical Polynomials with Feature Learning by Learnab...

arXiv - Machine Learning · 4 min · 10 minutes ago

Llms

[2603.07475] A Comparative analysis of Layer-wise Representational Capacity in AR and Diffusion LLMs

Abstract page for arXiv paper 2603.07475: A Comparative analysis of Layer-wise Representational Capacity in AR and Diffusion LLMs

arXiv - Machine Learning · 3 min · 10 minutes ago

[2604.03552] CRAFT: Video Diffusion for Bimanual Robot Data Generation

About this article

Related Articles

[2603.12365] Optimal Experimental Design for Reliable Learning of History-Dependent Constitutive Laws

[2603.17573] HeiSD: Hybrid Speculative Decoding for Embodied Vision-Language-Action Models with Kinematic Awareness

[2512.20562] Shallow Neural Networks Learn Low-Degree Spherical Polynomials with Feature Learning by Learnable Channel Attention

[2603.07475] A Comparative analysis of Layer-wise Representational Capacity in AR and Diffusion LLMs

No comments

Stay updated with AI News