[2603.23481] VTAM: Video-Tactile-Action Models for Complex Physical Interaction Beyond VLAs

[2603.23481] VTAM: Video-Tactile-Action Models for Complex Physical Interaction Beyond VLAs

arXiv - AI 4 min read

About this article

Abstract page for arXiv paper 2603.23481: VTAM: Video-Tactile-Action Models for Complex Physical Interaction Beyond VLAs

Computer Science > Robotics arXiv:2603.23481 (cs) [Submitted on 24 Mar 2026] Title:VTAM: Video-Tactile-Action Models for Complex Physical Interaction Beyond VLAs Authors:Haoran Yuan, Weigang Yi, Zhenyu Zhang, Wendi Chen, Yuchen Mo, Jiashi Yin, Xinzhuo Li, Xiangyu Zeng, Chuan Wen, Cewu Lu, Katherine Driggs-Campbell, Ismini Lourentzou View a PDF of the paper titled VTAM: Video-Tactile-Action Models for Complex Physical Interaction Beyond VLAs, by Haoran Yuan and 11 other authors View PDF Abstract:Video-Action Models (VAMs) have emerged as a promising framework for embodied intelligence, learning implicit world dynamics from raw video streams to produce temporally consistent action predictions. Although such models demonstrate strong performance on long-horizon tasks through visual reasoning, they remain limited in contact-rich scenarios where critical interaction states are only partially observable from vision alone. In particular, fine-grained force modulation and contact transitions are not reliably encoded in visual tokens, leading to unstable or imprecise behaviors. To bridge this gap, we introduce the Video-Tactile Action Model (VTAM), a multimodal world modeling framework that incorporates tactile perception as a complementary grounding signal. VTAM augments a pretrained video transformer with tactile streams via a lightweight modality transfer finetuning, enabling efficient cross-modal representation learning without tactile-language paired data or independent tactil...

Originally published on March 25, 2026. Curated by AI News.

Related Articles

Machine Learning

[R] ICML Anonymized git repos for rebuttal

A number of the papers I'm reviewing for have submitted additional figures and code through anonymized git repos (e.g. https://anonymous....

Reddit - Machine Learning · 1 min ·
Llms

[R] Reference model free behavioral discovery of AudiBench model organisms via Probe-Mediated Adaptive Auditing

Anthropic's AuditBench - 56 Llama 3.3 70B models with planted hidden behaviors - their best agent detects the behaviros 10-13% of the tim...

Reddit - Machine Learning · 1 min ·
UMKC Announces New Master of Science in Artificial Intelligence
Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min ·
Llms

[P] Dante-2B: I'm training a 2.1B bilingual fully open Italian/English LLM from scratch on 2×H200. Phase 1 done — here's what I've built.

The problem If you work with Italian text and local models, you know the pain. Every open-source LLM out there treats Italian as an after...

Reddit - Machine Learning · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime