[2604.03322] VitaTouch: Property-Aware Vision-Tactile-Language Model

[2604.03322] VitaTouch: Property-Aware Vision-Tactile-Language Model for Robotic Quality Inspection in Manufacturing

arXiv - AI April 07, 2026 4 min read

About this article

Abstract page for arXiv paper 2604.03322: VitaTouch: Property-Aware Vision-Tactile-Language Model for Robotic Quality Inspection in Manufacturing

Computer Science > Computer Vision and Pattern Recognition arXiv:2604.03322 (cs) [Submitted on 2 Apr 2026] Title:VitaTouch: Property-Aware Vision-Tactile-Language Model for Robotic Quality Inspection in Manufacturing Authors:Junyi Zong, Qingxuan Jia, Meixian Shi, Tong Li, Jiayuan Li, Zihang Lv, Gang Chen, Fang Deng View a PDF of the paper titled VitaTouch: Property-Aware Vision-Tactile-Language Model for Robotic Quality Inspection in Manufacturing, by Junyi Zong and 7 other authors View PDF HTML (experimental) Abstract:Quality inspection in smart manufacturing requires identifying intrinsic material and surface properties beyond visible geometry, yet vision-only methods remain vulnerable to occlusion and reflection. We propose VitaTouch, a property-aware vision-tactile-language model for material-property inference and natural-language attribute description. VitaTouch uses modality-specific encoders and a dual Q-Former to extract language-relevant visual and tactile features, which are compressed into prefix tokens for a large language model. We align each modality with text and explicitly couple vision and touch through contrastive learning. We also construct VitaSet, a multimodal dataset with 186 objects, 52k images, and 5.1k human-verified instruction-answer pairs. VitaTouch achieves the best performance on HCT and the overall TVL benchmark, while remaining competitive on SSVTP. On VitaSet, it reaches 88.89% hardness accuracy, 75.13% roughness accuracy, and 54.81% descr...

Originally published on April 07, 2026. Curated by AI News.

Llms

Karpathy dropped a 200-line GPT, so I used the math to turn pandas DataFrames into searchable context windows and open sourced it (and automated my stats pipeline). [P]

TL;DR: I got tired of manually running Shapiro-Wilk tests and copy-pasting p-values at 2 AM. I built an open-source, async Python pipelin...

Reddit - Machine Learning · 1 min · about 1 hour ago

Llms

I built a solo AI platform from Algeria with no funding, no team and no ad spend - here's what's inside it after 2 months

Hello, 20 years old here just got into the Ai platform and launched this last two weeks and here is what I have on it so far. - Latest Ai...

Reddit - Artificial Intelligence · 1 min · about 5 hours ago

Llms

USF murder suspect accused of using ChatGPT to research cover-up, prosecutors say

Days after the remains of one of the two missing University of South Florida doctoral students were found, prosecutors say the suspect ma...

AI Tools & Products · 3 min · about 5 hours ago

Llms

Anthropic’s Claude AI deletes PocketOS production database

Claude AI deleted PocketOS's production database, but the market for Claude 4.7 release by May 31 remains at 100% YES.

AI Tools & Products · 3 min · about 5 hours ago

[2604.03322] VitaTouch: Property-Aware Vision-Tactile-Language Model for Robotic Quality Inspection in Manufacturing

About this article

Related Articles

Karpathy dropped a 200-line GPT, so I used the math to turn pandas DataFrames into searchable context windows and open sourced it (and automated my stats pipeline). [P]

I built a solo AI platform from Algeria with no funding, no team and no ad spend - here's what's inside it after 2 months

USF murder suspect accused of using ChatGPT to research cover-up, prosecutors say

Anthropic’s Claude AI deletes PocketOS production database

No comments

Stay updated with AI News