Robotics Ai Agents Computer Vision

[2602.18374] Zero-shot Interactive Perception

arXiv - AI February 23, 2026 3 min read Article

Summary

The paper presents Zero-Shot Interactive Perception (ZS-IP), a framework that enhances robotic manipulation through a memory-driven Vision Language Model, improving performance in complex environments.

Why It Matters

As robotics increasingly integrates AI for complex tasks, ZS-IP offers a novel approach to enhance robots' interaction capabilities in partially observable scenarios. This advancement could significantly impact fields such as automation, manufacturing, and service robotics, where effective manipulation of objects is crucial.

Key Takeaways

ZS-IP combines multi-strategy manipulation with a memory-driven Vision Language Model.
The Enhanced Observation module introduces pushlines for improved visual perception.
ZS-IP outperforms traditional methods in pushing tasks while maintaining non-target elements.
The framework is tested on a 7-DOF Franka Panda arm in diverse scenarios.
This research addresses challenges in occlusion and ambiguity in robotic tasks.

Computer Science > Robotics arXiv:2602.18374 (cs) [Submitted on 20 Feb 2026] Title:Zero-shot Interactive Perception Authors:Venkatesh Sripada, Frank Guerin, Amir Ghalamzan View a PDF of the paper titled Zero-shot Interactive Perception, by Venkatesh Sripada and 2 other authors View PDF HTML (experimental) Abstract:Interactive perception (IP) enables robots to extract hidden information in their workspace and execute manipulation plans by physically interacting with objects and altering the state of the environment -- crucial for resolving occlusions and ambiguity in complex, partially observable scenarios. We present Zero-Shot IP (ZS-IP), a novel framework that couples multi-strategy manipulation (pushing and grasping) with a memory-driven Vision Language Model (VLM) to guide robotic interactions and resolve semantic queries. ZS-IP integrates three key components: (1) an Enhanced Observation (EO) module that augments the VLM's visual perception with both conventional keypoints and our proposed pushlines -- a novel 2D visual augmentation tailored to pushing actions, (2) a memory-guided action module that reinforces semantic reasoning through context lookup, and (3) a robotic controller that executes pushing, pulling, or grasping based on VLM output. Unlike grid-based augmentations optimized for pick-and-place, pushlines capture affordances for contact-rich actions, substantially improving pushing performance. We evaluate ZS-IP on a 7-DOF Franka Panda arm across diverse scen...

Read Original Article

Llms

An attack class that passes every current LLM filter - no payload, no injection signature, no log trace

https://shapingrooms.com/research I published a paper today on something I've been calling postural manipulation. The short version: ordi...

Reddit - Artificial Intelligence · 1 min · about 2 hours ago

Llms

[R] An attack class that passes every current LLM filter - no payload, no injection signature, no log trace

https://shapingrooms.com/research I've been documenting what I'm calling postural manipulation: a specific class of language that install...

Reddit - Machine Learning · 1 min · about 3 hours ago

Machine Learning

[2601.07855] RoAD Benchmark: How LiDAR Models Fail under Coupled Domain Shifts and Label Evolution

Abstract page for arXiv paper 2601.07855: RoAD Benchmark: How LiDAR Models Fail under Coupled Domain Shifts and Label Evolution

arXiv - AI · 3 min · about 13 hours ago

Llms

[2502.00262] INSIGHT: Enhancing Autonomous Driving Safety through Vision-Language Models on Context-Aware Hazard Detection and Edge Case Evaluation

Abstract page for arXiv paper 2502.00262: INSIGHT: Enhancing Autonomous Driving Safety through Vision-Language Models on Context-Aware Ha...

arXiv - AI · 4 min · about 13 hours ago

[2602.18374] Zero-shot Interactive Perception

Summary

Why It Matters

Key Takeaways

Related Articles

An attack class that passes every current LLM filter - no payload, no injection signature, no log trace

[R] An attack class that passes every current LLM filter - no payload, no injection signature, no log trace

[2601.07855] RoAD Benchmark: How LiDAR Models Fail under Coupled Domain Shifts and Label Evolution

[2502.00262] INSIGHT: Enhancing Autonomous Driving Safety through Vision-Language Models on Context-Aware Hazard Detection and Edge Case Evaluation

No comments

Stay updated with AI News