[2604.02108] Cross-Modal Visuo-Tactile Object Perception

arXiv - Machine Learning April 03, 2026 4 min read

About this article

Abstract page for arXiv paper 2604.02108: Cross-Modal Visuo-Tactile Object Perception

Computer Science > Robotics arXiv:2604.02108 (cs) [Submitted on 2 Apr 2026] Title:Cross-Modal Visuo-Tactile Object Perception Authors:Anirvan Dutta, Simone Tasciotti, Claudia Cusseddu, Ang Li, Panayiota Poirazi, Julijana Gjorgjieva, Etienne Burdet, Patrick van der Smagt, Mohsen Kaboli View a PDF of the paper titled Cross-Modal Visuo-Tactile Object Perception, by Anirvan Dutta and 7 other authors View PDF HTML (experimental) Abstract:Estimating physical properties is critical for safe and efficient autonomous robotic manipulation, particularly during contact-rich interactions. In such settings, vision and tactile sensing provide complementary information about object geometry, pose, inertia, stiffness, and contact dynamics, such as stick-slip behavior. However, these properties are only indirectly observable and cannot always be modeled precisely (e.g., deformation in non-rigid objects coupled with nonlinear contact friction), making the estimation problem inherently complex and requiring sustained exploitation of visuo-tactile sensory information during action. Existing visuo-tactile perception frameworks have primarily emphasized forceful sensor fusion or static cross-modal alignment, with limited consideration of how uncertainty and beliefs about object properties evolve over time. Inspired by human multi-sensory perception and active inference, we propose the Cross-Modal Latent Filter (CMLF) to learn a structured, causal latent state-space of physical object properties....

Originally published on April 03, 2026. Curated by AI News.

Machine Learning

[R] VOID: Video Object and Interaction Deletion (physically-consistent video inpainting)

We present VOID, a model for video object removal that aims to handle *physical interactions*, not just appearance. Most existing video i...

Reddit - Machine Learning · 1 min · 28 minutes ago

Machine Learning

FLUX 2 Pro (2026) Sketch to Image

I sketched a cow and tested how different models interpret it into a realistic image for downstream 3D generation, turns out some models ...

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

Machine Learning

Improving AI models’ ability to explain their predictions

AI News - General · 9 min · about 2 hours ago

Machine Learning

[D] TMLR reviews seem more reliable than ICML/NeurIPS/ICLR

This year I submitted a paper to ICML for the first time. I have also experienced the review process at TMLR and ICLR. From my observatio...

Reddit - Machine Learning · 1 min · about 3 hours ago

[2604.02108] Cross-Modal Visuo-Tactile Object Perception

About this article

Related Articles

[R] VOID: Video Object and Interaction Deletion (physically-consistent video inpainting)

FLUX 2 Pro (2026) Sketch to Image

Improving AI models’ ability to explain their predictions

[D] TMLR reviews seem more reliable than ICML/NeurIPS/ICLR

No comments

Stay updated with AI News