[2602.19193] Visual Prompt Guided Unified Pushing Policy
Summary
The paper presents a novel unified pushing policy that utilizes visual prompts to enhance the efficiency and versatility of robotic pushing tasks, outperforming existing methods.
Why It Matters
This research addresses limitations in current robotic manipulation techniques by introducing a flexible, multimodal approach to pushing. It enhances the adaptability of robots in various environments, which is crucial for advancing automation and robotics applications in real-world scenarios.
Key Takeaways
- Introduces a unified pushing policy that integrates visual prompts.
- Enhances the efficiency of robotic manipulation tasks.
- Demonstrates superior performance compared to existing baselines.
- Supports a wide range of planning problems with a flexible approach.
- Can be utilized as a low-level primitive in VLM-guided planning frameworks.
Computer Science > Robotics arXiv:2602.19193 (cs) [Submitted on 22 Feb 2026] Title:Visual Prompt Guided Unified Pushing Policy Authors:Hieu Bui, Ziyan Gao, Yuya Hosoda, Joo-Ho Lee View a PDF of the paper titled Visual Prompt Guided Unified Pushing Policy, by Hieu Bui and 3 other authors View PDF HTML (experimental) Abstract:As one of the simplest non-prehensile manipulation skills, pushing has been widely studied as an effective means to rearrange objects. Existing approaches, however, typically rely on multi-step push plans composed of pre-defined pushing primitives with limited application scopes, which restrict their efficiency and versatility across different scenarios. In this work, we propose a unified pushing policy that incorporates a lightweight prompting mechanism into a flow matching policy to guide the generation of reactive, multimodal pushing actions. The visual prompt can be specified by a high-level planner, enabling the reuse of the pushing policy across a wide range of planning problems. Experimental results demonstrate that the proposed unified pushing policy not only outperforms existing baselines but also effectively serves as a low-level primitive within a VLM-guided planning framework to solve table-cleaning tasks efficiently. Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI) Cite as: arXiv:2602.19193 [cs.RO] (or arXiv:2602.19193v1 [cs.RO] for this version) https://doi.org/10.48550/arXiv.2602.19193 Focus to learn more arXiv-issued DOI v...