[2603.23481] VTAM: Video-Tactile-Action Models for Complex Physical

[2603.23481] VTAM: Video-Tactile-Action Models for Complex Physical Interaction Beyond VLAs

arXiv - AI March 25, 2026 4 min read

About this article

Abstract page for arXiv paper 2603.23481: VTAM: Video-Tactile-Action Models for Complex Physical Interaction Beyond VLAs

Computer Science > Robotics arXiv:2603.23481 (cs) [Submitted on 24 Mar 2026] Title:VTAM: Video-Tactile-Action Models for Complex Physical Interaction Beyond VLAs Authors:Haoran Yuan, Weigang Yi, Zhenyu Zhang, Wendi Chen, Yuchen Mo, Jiashi Yin, Xinzhuo Li, Xiangyu Zeng, Chuan Wen, Cewu Lu, Katherine Driggs-Campbell, Ismini Lourentzou View a PDF of the paper titled VTAM: Video-Tactile-Action Models for Complex Physical Interaction Beyond VLAs, by Haoran Yuan and 11 other authors View PDF Abstract:Video-Action Models (VAMs) have emerged as a promising framework for embodied intelligence, learning implicit world dynamics from raw video streams to produce temporally consistent action predictions. Although such models demonstrate strong performance on long-horizon tasks through visual reasoning, they remain limited in contact-rich scenarios where critical interaction states are only partially observable from vision alone. In particular, fine-grained force modulation and contact transitions are not reliably encoded in visual tokens, leading to unstable or imprecise behaviors. To bridge this gap, we introduce the Video-Tactile Action Model (VTAM), a multimodal world modeling framework that incorporates tactile perception as a complementary grounding signal. VTAM augments a pretrained video transformer with tactile streams via a lightweight modality transfer finetuning, enabling efficient cross-modal representation learning without tactile-language paired data or independent tactil...

Originally published on March 25, 2026. Curated by AI News.

Machine Learning

[R] ICML Anonymized git repos for rebuttal

A number of the papers I'm reviewing for have submitted additional figures and code through anonymized git repos (e.g. https://anonymous....

Reddit - Machine Learning · 1 min · about 1 hour ago

Llms

[R] Reference model free behavioral discovery of AudiBench model organisms via Probe-Mediated Adaptive Auditing

Anthropic's AuditBench - 56 Llama 3.3 70B models with planted hidden behaviors - their best agent detects the behaviros 10-13% of the tim...

Reddit - Machine Learning · 1 min · about 1 hour ago

Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min · about 1 hour ago

Llms

[P] Dante-2B: I'm training a 2.1B bilingual fully open Italian/English LLM from scratch on 2×H200. Phase 1 done — here's what I've built.

The problem If you work with Italian text and local models, you know the pain. Every open-source LLM out there treats Italian as an after...

Reddit - Machine Learning · 1 min · about 2 hours ago

[2603.23481] VTAM: Video-Tactile-Action Models for Complex Physical Interaction Beyond VLAs

About this article

Related Articles

[R] ICML Anonymized git repos for rebuttal

[R] Reference model free behavioral discovery of AudiBench model organisms via Probe-Mediated Adaptive Auditing

UMKC Announces New Master of Science in Artificial Intelligence

[P] Dante-2B: I'm training a 2.1B bilingual fully open Italian/English LLM from scratch on 2×H200. Phase 1 done — here's what I've built.

No comments

Stay updated with AI News