Robotics Machine Learning Computer Vision

[2505.18487] Grounding Bodily Awareness in Visual Representations for Efficient Policy Learning

arXiv - Machine Learning February 17, 2026 3 min read Article

Summary

This paper explores how grounding bodily awareness in visual representations can enhance policy learning for robotic manipulation, introducing a novel contrastive learning method called ICon.

Why It Matters

As robotics continues to evolve, effective policy learning is crucial for improving robotic manipulation tasks. This research addresses a fundamental challenge by integrating bodily awareness into visual representations, potentially leading to more efficient and adaptable robotic systems.

Key Takeaways

The paper introduces ICon, a contrastive learning method that enhances visual representations for robots.
ICon separates agent-specific and environment-specific tokens, resulting in improved policy learning.
The framework can be integrated into existing policy learning systems, enhancing performance across various tasks.
Experiments indicate that ICon facilitates policy transfer between different robotic platforms.
The findings could lead to advancements in robotic manipulation efficiency and adaptability.

Computer Science > Robotics arXiv:2505.18487 (cs) [Submitted on 24 May 2025 (v1), last revised 14 Feb 2026 (this version, v2)] Title:Grounding Bodily Awareness in Visual Representations for Efficient Policy Learning Authors:Junlin Wang, Zhiyun Lin View a PDF of the paper titled Grounding Bodily Awareness in Visual Representations for Efficient Policy Learning, by Junlin Wang and 1 other authors View PDF HTML (experimental) Abstract:Learning effective visual representations for robotic manipulation remains a fundamental challenge due to the complex body dynamics involved in action execution. In this paper, we study how visual representations that carry body-relevant cues can enable efficient policy learning for downstream robotic manipulation tasks. We present $\textbf{I}$nter-token $\textbf{Con}$trast ($\textbf{ICon}$), a contrastive learning method applied to the token-level representations of Vision Transformers (ViTs). ICon enforces a separation in the feature space between agent-specific and environment-specific tokens, resulting in agent-centric visual representations that embed body-specific inductive biases. This framework can be seamlessly integrated into end-to-end policy learning by incorporating the contrastive loss as an auxiliary objective. Our experiments show that ICon not only improves policy performance across various manipulation tasks but also facilitates policy transfer across different robots. The project website: this https URL Comments: Subjects: Rob...

Read Original Article

[2505.18487] Grounding Bodily Awareness in Visual Representations for Efficient Policy Learning

Summary

Why It Matters

Key Takeaways

Related Articles

AIPass Herald

[2603.13846] Is Seeing Believing? Evaluating Human Sensitivity to Synthetic Video

[2603.09455] Declarative Scenario-based Testing with RoadLogic

[2601.20404] On the Impact of AGENTS.md Files on the Efficiency of AI Coding Agents

No comments

Stay updated with AI News