[2602.19348] MultiDiffSense: Diffusion-Based Multi-Modal Visuo-Tactile Image Generation Conditioned on Object Shape and Contact Pose

[2602.19348] MultiDiffSense: Diffusion-Based Multi-Modal Visuo-Tactile Image Generation Conditioned on Object Shape and Contact Pose

arXiv - AI 3 min read Article

Summary

The paper presents MultiDiffSense, a diffusion-based model for generating visuo-tactile images conditioned on object shape and contact poses, improving data collection efficiency for robotic applications.

Why It Matters

As acquiring aligned visuo-tactile datasets is costly and time-consuming, MultiDiffSense addresses this challenge by enabling scalable and controllable synthetic dataset generation, which is crucial for advancing robotics and tactile sensing technologies.

Key Takeaways

  • MultiDiffSense synthesizes multi-modal images for tactile sensors using a unified diffusion model.
  • The model significantly outperforms traditional methods like Pix2Pix in generating high-quality images.
  • Combining synthetic and real data can reduce the need for extensive real data while maintaining performance.

Computer Science > Computer Vision and Pattern Recognition arXiv:2602.19348 (cs) [Submitted on 22 Feb 2026] Title:MultiDiffSense: Diffusion-Based Multi-Modal Visuo-Tactile Image Generation Conditioned on Object Shape and Contact Pose Authors:Sirine Bhouri, Lan Wei, Jian-Qing Zheng, Dandan Zhang View a PDF of the paper titled MultiDiffSense: Diffusion-Based Multi-Modal Visuo-Tactile Image Generation Conditioned on Object Shape and Contact Pose, by Sirine Bhouri and 3 other authors View PDF HTML (experimental) Abstract:Acquiring aligned visuo-tactile datasets is slow and costly, requiring specialised hardware and large-scale data collection. Synthetic generation is promising, but prior methods are typically single-modality, limiting cross-modal learning. We present MultiDiffSense, a unified diffusion model that synthesises images for multiple vision-based tactile sensors (ViTac, TacTip, ViTacTip) within a single architecture. Our approach uses dual conditioning on CAD-derived, pose-aligned depth maps and structured prompts that encode sensor type and 4-DoF contact pose, enabling controllable, physically consistent multi-modal synthesis. Evaluating on 8 objects (5 seen, 3 novel) and unseen poses, MultiDiffSense outperforms a Pix2Pix cGAN baseline in SSIM by +36.3% (ViTac), +134.6% (ViTacTip), and +64.7% (TacTip). For downstream 3-DoF pose estimation, mixing 50% synthetic with 50% real halves the required real data while maintaining competitive performance. MultiDiffSense alle...

Related Articles

Llms

CLI for Google AI Search (gai.google) — run AI-powered code/tech searches headlessly from your terminal

Google AI (gai.google) gives Gemini-powered answers for technical queries — think AI-enhanced search with code understanding. I built a C...

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

Big increase in the amount of people using AI to write their replies with AI

I find it interesting that we’ve all randomly decided to use the “-“ more often recently on reddit, and everyone’s grammar has drasticall...

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

[D] MXFP8 GEMM: Up to 99% of cuBLAS performance using CUDA + PTX

New blog post by Daniel Vega-Myhre (Meta/PyTorch) illustrating GEMM design for FP8, including deep-dives into all the constraints and des...

Reddit - Machine Learning · 1 min ·
IIT Delhi launches 8th batch of Advanced AI, ML, and DL online programme: Check who is eligible, applicat
Machine Learning

IIT Delhi launches 8th batch of Advanced AI, ML, and DL online programme: Check who is eligible, applicat

News News: The Continuing Education Programme (CEP) at IIT Delhi has announced the launch of the 8th batch of its Advanced Certificate Pr...

AI News - General · 9 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime