Robotics Ai Agents Computer Vision

[2602.19193] Visual Prompt Guided Unified Pushing Policy

arXiv - AI February 24, 2026 3 min read Article

Summary

The paper presents a novel unified pushing policy that utilizes visual prompts to enhance the efficiency and versatility of robotic pushing tasks, outperforming existing methods.

Why It Matters

This research addresses limitations in current robotic manipulation techniques by introducing a flexible, multimodal approach to pushing. It enhances the adaptability of robots in various environments, which is crucial for advancing automation and robotics applications in real-world scenarios.

Key Takeaways

Introduces a unified pushing policy that integrates visual prompts.
Enhances the efficiency of robotic manipulation tasks.
Demonstrates superior performance compared to existing baselines.
Supports a wide range of planning problems with a flexible approach.
Can be utilized as a low-level primitive in VLM-guided planning frameworks.

Computer Science > Robotics arXiv:2602.19193 (cs) [Submitted on 22 Feb 2026] Title:Visual Prompt Guided Unified Pushing Policy Authors:Hieu Bui, Ziyan Gao, Yuya Hosoda, Joo-Ho Lee View a PDF of the paper titled Visual Prompt Guided Unified Pushing Policy, by Hieu Bui and 3 other authors View PDF HTML (experimental) Abstract:As one of the simplest non-prehensile manipulation skills, pushing has been widely studied as an effective means to rearrange objects. Existing approaches, however, typically rely on multi-step push plans composed of pre-defined pushing primitives with limited application scopes, which restrict their efficiency and versatility across different scenarios. In this work, we propose a unified pushing policy that incorporates a lightweight prompting mechanism into a flow matching policy to guide the generation of reactive, multimodal pushing actions. The visual prompt can be specified by a high-level planner, enabling the reuse of the pushing policy across a wide range of planning problems. Experimental results demonstrate that the proposed unified pushing policy not only outperforms existing baselines but also effectively serves as a low-level primitive within a VLM-guided planning framework to solve table-cleaning tasks efficiently. Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI) Cite as: arXiv:2602.19193 [cs.RO] (or arXiv:2602.19193v1 [cs.RO] for this version) https://doi.org/10.48550/arXiv.2602.19193 Focus to learn more arXiv-issued DOI v...

Read Original Article

Machine Learning

[2601.07855] RoAD Benchmark: How LiDAR Models Fail under Coupled Domain Shifts and Label Evolution

Abstract page for arXiv paper 2601.07855: RoAD Benchmark: How LiDAR Models Fail under Coupled Domain Shifts and Label Evolution

arXiv - AI · 3 min · about 7 hours ago

Llms

[2502.00262] INSIGHT: Enhancing Autonomous Driving Safety through Vision-Language Models on Context-Aware Hazard Detection and Edge Case Evaluation

Abstract page for arXiv paper 2502.00262: INSIGHT: Enhancing Autonomous Driving Safety through Vision-Language Models on Context-Aware Ha...

arXiv - AI · 4 min · about 7 hours ago

Llms

[2508.00500] ProbGuard: Probabilistic Runtime Monitoring for LLM Agent Safety

Abstract page for arXiv paper 2508.00500: ProbGuard: Probabilistic Runtime Monitoring for LLM Agent Safety

arXiv - AI · 4 min · about 7 hours ago

Robotics

[2603.26660] Ruka-v2: Tendon Driven Open-Source Dexterous Hand with Wrist and Abduction for Robot Learning

Abstract page for arXiv paper 2603.26660: Ruka-v2: Tendon Driven Open-Source Dexterous Hand with Wrist and Abduction for Robot Learning

arXiv - AI · 4 min · about 7 hours ago

[2602.19193] Visual Prompt Guided Unified Pushing Policy

Summary

Why It Matters

Key Takeaways

Related Articles

[2601.07855] RoAD Benchmark: How LiDAR Models Fail under Coupled Domain Shifts and Label Evolution

[2502.00262] INSIGHT: Enhancing Autonomous Driving Safety through Vision-Language Models on Context-Aware Hazard Detection and Edge Case Evaluation

[2508.00500] ProbGuard: Probabilistic Runtime Monitoring for LLM Agent Safety

[2603.26660] Ruka-v2: Tendon Driven Open-Source Dexterous Hand with Wrist and Abduction for Robot Learning

No comments

Stay updated with AI News