[2602.13444] FlowHOI: Flow-based Semantics-Grounded Generation of Hand-Object Interactions for Dexterous Robot Manipulation
Summary
FlowHOI presents a novel framework for generating hand-object interactions in robotic manipulation, enhancing the realism and efficiency of robot task execution.
Why It Matters
This research addresses a critical gap in robotics by providing a structured approach to hand-object interactions, which are essential for effective robot manipulation. The proposed FlowHOI framework improves the accuracy and speed of robotic actions, making it a significant advancement in the field of robotics and AI.
Key Takeaways
- FlowHOI introduces a two-stage flow-matching framework for generating hand-object interactions.
- The framework enhances action recognition accuracy and physics simulation success rates.
- It achieves a significant speedup in inference times compared to existing methods.
- Real-robot execution demonstrates the practical applicability of the generated interactions.
- The research addresses the scarcity of high-fidelity supervision for hand-object interactions.
Computer Science > Robotics arXiv:2602.13444 (cs) [Submitted on 13 Feb 2026] Title:FlowHOI: Flow-based Semantics-Grounded Generation of Hand-Object Interactions for Dexterous Robot Manipulation Authors:Huajian Zeng, Lingyun Chen, Jiaqi Yang, Yuantai Zhang, Fan Shi, Peidong Liu, Xingxing Zuo View a PDF of the paper titled FlowHOI: Flow-based Semantics-Grounded Generation of Hand-Object Interactions for Dexterous Robot Manipulation, by Huajian Zeng and Lingyun Chen and Jiaqi Yang and Yuantai Zhang and Fan Shi and Peidong Liu and Xingxing Zuo View PDF HTML (experimental) Abstract:Recent vision-language-action (VLA) models can generate plausible end-effector motions, yet they often fail in long-horizon, contact-rich tasks because the underlying hand-object interaction (HOI) structure is not explicitly represented. An embodiment-agnostic interaction representation that captures this structure would make manipulation behaviors easier to validate and transfer across robots. We propose FlowHOI, a two-stage flow-matching framework that generates semantically grounded, temporally coherent HOI sequences, comprising hand poses, object poses, and hand-object contact states, conditioned on an egocentric observation, a language instruction, and a 3D Gaussian splatting (3DGS) scene reconstruction. We decouple geometry-centric grasping from semantics-centric manipulation, conditioning the latter on compact 3D scene tokens and employing a motion-text alignment loss to semantically ground th...