[2604.02108] Cross-Modal Visuo-Tactile Object Perception
About this article
Abstract page for arXiv paper 2604.02108: Cross-Modal Visuo-Tactile Object Perception
Computer Science > Robotics arXiv:2604.02108 (cs) [Submitted on 2 Apr 2026] Title:Cross-Modal Visuo-Tactile Object Perception Authors:Anirvan Dutta, Simone Tasciotti, Claudia Cusseddu, Ang Li, Panayiota Poirazi, Julijana Gjorgjieva, Etienne Burdet, Patrick van der Smagt, Mohsen Kaboli View a PDF of the paper titled Cross-Modal Visuo-Tactile Object Perception, by Anirvan Dutta and 7 other authors View PDF HTML (experimental) Abstract:Estimating physical properties is critical for safe and efficient autonomous robotic manipulation, particularly during contact-rich interactions. In such settings, vision and tactile sensing provide complementary information about object geometry, pose, inertia, stiffness, and contact dynamics, such as stick-slip behavior. However, these properties are only indirectly observable and cannot always be modeled precisely (e.g., deformation in non-rigid objects coupled with nonlinear contact friction), making the estimation problem inherently complex and requiring sustained exploitation of visuo-tactile sensory information during action. Existing visuo-tactile perception frameworks have primarily emphasized forceful sensor fusion or static cross-modal alignment, with limited consideration of how uncertainty and beliefs about object properties evolve over time. Inspired by human multi-sensory perception and active inference, we propose the Cross-Modal Latent Filter (CMLF) to learn a structured, causal latent state-space of physical object properties....