[2603.29759] TSHA: A Benchmark for Visual Language Models in Trustworthy Safety Hazard Assessment Scenarios
About this article
Abstract page for arXiv paper 2603.29759: TSHA: A Benchmark for Visual Language Models in Trustworthy Safety Hazard Assessment Scenarios
Computer Science > Computer Vision and Pattern Recognition arXiv:2603.29759 (cs) [Submitted on 31 Mar 2026] Title:TSHA: A Benchmark for Visual Language Models in Trustworthy Safety Hazard Assessment Scenarios Authors:Qiucheng Yu, Ruijie Xu, Mingang Chen, Xuequan Lu, Jianfeng Dong, Chaochao Lu, Xin Tan View a PDF of the paper titled TSHA: A Benchmark for Visual Language Models in Trustworthy Safety Hazard Assessment Scenarios, by Qiucheng Yu and 6 other authors View PDF HTML (experimental) Abstract:Recent advances in vision-language models (VLMs) have accelerated their application to indoor safety hazards assessment. However, existing benchmarks suffer from three fundamental limitations: (1) heavy reliance on synthetic datasets constructed via simulation software, creating a significant domain gap with real-world environments; (2) oversimplified safety tasks with artificial constraints on hazard and scene types, thereby limiting model generalization; and (3) absence of rigorous evaluation protocols to thoroughly assess model capabilities in complex home safety scenarios. To address these challenges, we introduce TSHA (\textbf{T}rustworthy \textbf{S}afety \textbf{H}azards \textbf{A}ssessment), a comprehensive benchmark comprising 81,809 carefully curated training samples drawn from four complementary sources: existing indoor datasets, internet images, AIGC images, and newly captured images. This benchmark set also includes a highly challenging test set with 1707 samples, com...