[2602.18374] Zero-shot Interactive Perception

[2602.18374] Zero-shot Interactive Perception

arXiv - AI 3 min read Article

Summary

The paper presents Zero-Shot Interactive Perception (ZS-IP), a framework that enhances robotic manipulation through a memory-driven Vision Language Model, improving performance in complex environments.

Why It Matters

As robotics increasingly integrates AI for complex tasks, ZS-IP offers a novel approach to enhance robots' interaction capabilities in partially observable scenarios. This advancement could significantly impact fields such as automation, manufacturing, and service robotics, where effective manipulation of objects is crucial.

Key Takeaways

  • ZS-IP combines multi-strategy manipulation with a memory-driven Vision Language Model.
  • The Enhanced Observation module introduces pushlines for improved visual perception.
  • ZS-IP outperforms traditional methods in pushing tasks while maintaining non-target elements.
  • The framework is tested on a 7-DOF Franka Panda arm in diverse scenarios.
  • This research addresses challenges in occlusion and ambiguity in robotic tasks.

Computer Science > Robotics arXiv:2602.18374 (cs) [Submitted on 20 Feb 2026] Title:Zero-shot Interactive Perception Authors:Venkatesh Sripada, Frank Guerin, Amir Ghalamzan View a PDF of the paper titled Zero-shot Interactive Perception, by Venkatesh Sripada and 2 other authors View PDF HTML (experimental) Abstract:Interactive perception (IP) enables robots to extract hidden information in their workspace and execute manipulation plans by physically interacting with objects and altering the state of the environment -- crucial for resolving occlusions and ambiguity in complex, partially observable scenarios. We present Zero-Shot IP (ZS-IP), a novel framework that couples multi-strategy manipulation (pushing and grasping) with a memory-driven Vision Language Model (VLM) to guide robotic interactions and resolve semantic queries. ZS-IP integrates three key components: (1) an Enhanced Observation (EO) module that augments the VLM's visual perception with both conventional keypoints and our proposed pushlines -- a novel 2D visual augmentation tailored to pushing actions, (2) a memory-guided action module that reinforces semantic reasoning through context lookup, and (3) a robotic controller that executes pushing, pulling, or grasping based on VLM output. Unlike grid-based augmentations optimized for pick-and-place, pushlines capture affordances for contact-rich actions, substantially improving pushing performance. We evaluate ZS-IP on a 7-DOF Franka Panda arm across diverse scen...

Related Articles

Llms

An attack class that passes every current LLM filter - no payload, no injection signature, no log trace

https://shapingrooms.com/research I published a paper today on something I've been calling postural manipulation. The short version: ordi...

Reddit - Artificial Intelligence · 1 min ·
Llms

[R] An attack class that passes every current LLM filter - no payload, no injection signature, no log trace

https://shapingrooms.com/research I've been documenting what I'm calling postural manipulation: a specific class of language that install...

Reddit - Machine Learning · 1 min ·
[2601.07855] RoAD Benchmark: How LiDAR Models Fail under Coupled Domain Shifts and Label Evolution
Machine Learning

[2601.07855] RoAD Benchmark: How LiDAR Models Fail under Coupled Domain Shifts and Label Evolution

Abstract page for arXiv paper 2601.07855: RoAD Benchmark: How LiDAR Models Fail under Coupled Domain Shifts and Label Evolution

arXiv - AI · 3 min ·
[2502.00262] INSIGHT: Enhancing Autonomous Driving Safety through Vision-Language Models on Context-Aware Hazard Detection and Edge Case Evaluation
Llms

[2502.00262] INSIGHT: Enhancing Autonomous Driving Safety through Vision-Language Models on Context-Aware Hazard Detection and Edge Case Evaluation

Abstract page for arXiv paper 2502.00262: INSIGHT: Enhancing Autonomous Driving Safety through Vision-Language Models on Context-Aware Ha...

arXiv - AI · 4 min ·
More in Robotics: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime