[2602.15873] Test-Time Adaptation for Tactile-Vision-Language Models
Summary
This paper presents a novel approach to test-time adaptation (TTA) for tactile-vision-language (TVL) models, addressing challenges posed by modality shifts in real-world applications. It introduces a reliability-aware framework that enhances performance by adapting to unreliab...
Why It Matters
As tactile-vision-language models are increasingly deployed in robotics and multimodal perception tasks, ensuring their robustness against test-time distribution shifts is crucial. This research highlights the importance of modeling modality-wise reliability, which can significantly improve the performance of these systems in dynamic environments.
Key Takeaways
- The proposed reliability-aware framework estimates modality reliability from prediction uncertainty.
- The framework filters unreliable test samples and adapts feature fusion based on reliability.
- Significant accuracy improvements (up to 49.9%) were achieved under severe modality corruptions.
- The study emphasizes the need for explicit reliability modeling in TTA for TVL models.
- This approach can enhance the robustness of robotic systems in real-world applications.
Computer Science > Robotics arXiv:2602.15873 (cs) [Submitted on 31 Jan 2026] Title:Test-Time Adaptation for Tactile-Vision-Language Models Authors:Chuyang Ye, Haoxian Jing, Qinting Jiang, Yixi Lin, Qiang Li, Xing Tang, Jingyan Jiang View a PDF of the paper titled Test-Time Adaptation for Tactile-Vision-Language Models, by Chuyang Ye and 6 other authors View PDF HTML (experimental) Abstract:Tactile-vision-language (TVL) models are increasingly deployed in real-world robotic and multimodal perception tasks, where test-time distribution shifts are unavoidable. Existing test-time adaptation (TTA) methods provide filtering in unimodal settings but lack explicit treatment of modality-wise reliability under asynchronous cross-modal shifts, leaving them brittle when some modalities become unreliable. We study TTA for TVL models under such shifts and propose a reliability-aware framework that estimates per-modality reliability from prediction uncertainty and perturbation-based responses. This shared reliability signal is used to (i) filter unreliable test samples, (ii) adaptively fuse tactile, visual, and language features, and (iii) regularize test-time optimization with a reliability-guided objective. On the TAG-C benchmark and additional TVL scenarios, our approach consistently outperforms strong TTA baselines, achieving accuracy gains of up to 49.9\% under severe modality corruptions, underscoring the importance of explicit modality-wise reliability modeling for robust test-tim...