[2512.22854] ByteLoom: Weaving Geometry-Consistent Human-Object Interactions through Progressive Curriculum Learning
About this article
Abstract page for arXiv paper 2512.22854: ByteLoom: Weaving Geometry-Consistent Human-Object Interactions through Progressive Curriculum Learning
Computer Science > Computer Vision and Pattern Recognition arXiv:2512.22854 (cs) [Submitted on 28 Dec 2025 (v1), last revised 26 Mar 2026 (this version, v2)] Title:ByteLoom: Weaving Geometry-Consistent Human-Object Interactions through Progressive Curriculum Learning Authors:Bangya Liu, Xinyu Gong, Zelin Zhao, Ziyang Song, Yulei Lu, Suhui Wu, Jun Zhang, Suman Banerjee, Hao Zhang View a PDF of the paper titled ByteLoom: Weaving Geometry-Consistent Human-Object Interactions through Progressive Curriculum Learning, by Bangya Liu and 8 other authors View PDF HTML (experimental) Abstract:Human-object interaction (HOI) video generation has garnered increasing attention due to its promising applications in digital humans, e-commerce, advertising, and robotics imitation learning. However, existing methods face two critical limitations: (1) a lack of effective mechanisms to inject multi-view information of the object into the model, leading to poor cross-view consistency, and (2) heavy reliance on fine-grained hand mesh annotations for modeling interaction occlusions. To address these challenges, we introduce ByteLoom, a Diffusion Transformer (DiT)-based framework that generates realistic HOI videos with geometrically consistent object illustration, using simplified human conditioning and 3D object inputs. We first propose an RCM-cache mechanism that leverages Relative Coordinate Maps (RCM) as a universal representation to maintain object's geometry consistency and precisely contro...