[2603.29759] TSHA: A Benchmark for Visual Language Models in Trustworthy Safety Hazard Assessment Scenarios

[2603.29759] TSHA: A Benchmark for Visual Language Models in Trustworthy Safety Hazard Assessment Scenarios

arXiv - AI 4 min read

About this article

Abstract page for arXiv paper 2603.29759: TSHA: A Benchmark for Visual Language Models in Trustworthy Safety Hazard Assessment Scenarios

Computer Science > Computer Vision and Pattern Recognition arXiv:2603.29759 (cs) [Submitted on 31 Mar 2026] Title:TSHA: A Benchmark for Visual Language Models in Trustworthy Safety Hazard Assessment Scenarios Authors:Qiucheng Yu, Ruijie Xu, Mingang Chen, Xuequan Lu, Jianfeng Dong, Chaochao Lu, Xin Tan View a PDF of the paper titled TSHA: A Benchmark for Visual Language Models in Trustworthy Safety Hazard Assessment Scenarios, by Qiucheng Yu and 6 other authors View PDF HTML (experimental) Abstract:Recent advances in vision-language models (VLMs) have accelerated their application to indoor safety hazards assessment. However, existing benchmarks suffer from three fundamental limitations: (1) heavy reliance on synthetic datasets constructed via simulation software, creating a significant domain gap with real-world environments; (2) oversimplified safety tasks with artificial constraints on hazard and scene types, thereby limiting model generalization; and (3) absence of rigorous evaluation protocols to thoroughly assess model capabilities in complex home safety scenarios. To address these challenges, we introduce TSHA (\textbf{T}rustworthy \textbf{S}afety \textbf{H}azards \textbf{A}ssessment), a comprehensive benchmark comprising 81,809 carefully curated training samples drawn from four complementary sources: existing indoor datasets, internet images, AIGC images, and newly captured images. This benchmark set also includes a highly challenging test set with 1707 samples, com...

Originally published on April 01, 2026. Curated by AI News.

Related Articles

Llms

Gemma 4 actually running usable on an Android phone (not llama.cpp)

I wanted a real local assistant on my phone, not a demo. First tried the usual llama.cpp in Termux — Gemma 4 was 2–3 tok/s and the phone ...

Reddit - Artificial Intelligence · 1 min ·
Llms

Claude vs Gemini: Solving the laden knight's tour problem

AI Coding contest day 8 The eighth challenge is a weighted variant of the classic knight's tour. The knight must visit every square of a ...

Reddit - Artificial Intelligence · 1 min ·
Llms

AI helped me build a custom PC and 4 apps in 6 months with zero coding experience

Mid-October, early morning at work. I was hunting for a podcast to throw on while I worked and stumbled into something about what AI coul...

Reddit - Artificial Intelligence · 1 min ·
Llms

I thought of something while cooking up a simple RL AI. Please Validate it. [R]

So, I was trying to build a simple AI when I thought of, 'How could I give an AI some emotions? ' This led to one thing after another, an...

Reddit - Machine Learning · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime