[2603.29759] TSHA: A Benchmark for Visual Language Models in

[2603.29759] TSHA: A Benchmark for Visual Language Models in Trustworthy Safety Hazard Assessment Scenarios

arXiv - AI April 01, 2026 4 min read

About this article

Abstract page for arXiv paper 2603.29759: TSHA: A Benchmark for Visual Language Models in Trustworthy Safety Hazard Assessment Scenarios

Computer Science > Computer Vision and Pattern Recognition arXiv:2603.29759 (cs) [Submitted on 31 Mar 2026] Title:TSHA: A Benchmark for Visual Language Models in Trustworthy Safety Hazard Assessment Scenarios Authors:Qiucheng Yu, Ruijie Xu, Mingang Chen, Xuequan Lu, Jianfeng Dong, Chaochao Lu, Xin Tan View a PDF of the paper titled TSHA: A Benchmark for Visual Language Models in Trustworthy Safety Hazard Assessment Scenarios, by Qiucheng Yu and 6 other authors View PDF HTML (experimental) Abstract:Recent advances in vision-language models (VLMs) have accelerated their application to indoor safety hazards assessment. However, existing benchmarks suffer from three fundamental limitations: (1) heavy reliance on synthetic datasets constructed via simulation software, creating a significant domain gap with real-world environments; (2) oversimplified safety tasks with artificial constraints on hazard and scene types, thereby limiting model generalization; and (3) absence of rigorous evaluation protocols to thoroughly assess model capabilities in complex home safety scenarios. To address these challenges, we introduce TSHA (\textbf{T}rustworthy \textbf{S}afety \textbf{H}azards \textbf{A}ssessment), a comprehensive benchmark comprising 81,809 carefully curated training samples drawn from four complementary sources: existing indoor datasets, internet images, AIGC images, and newly captured images. This benchmark set also includes a highly challenging test set with 1707 samples, com...

Originally published on April 01, 2026. Curated by AI News.

Llms

Gemma 4 actually running usable on an Android phone (not llama.cpp)

I wanted a real local assistant on my phone, not a demo. First tried the usual llama.cpp in Termux — Gemma 4 was 2–3 tok/s and the phone ...

Reddit - Artificial Intelligence · 1 min · about 4 hours ago

Llms

Claude vs Gemini: Solving the laden knight's tour problem

AI Coding contest day 8 The eighth challenge is a weighted variant of the classic knight's tour. The knight must visit every square of a ...

Reddit - Artificial Intelligence · 1 min · about 4 hours ago

Llms

AI helped me build a custom PC and 4 apps in 6 months with zero coding experience

Mid-October, early morning at work. I was hunting for a podcast to throw on while I worked and stumbled into something about what AI coul...

Reddit - Artificial Intelligence · 1 min · about 6 hours ago

Llms

I thought of something while cooking up a simple RL AI. Please Validate it. [R]

So, I was trying to build a simple AI when I thought of, 'How could I give an AI some emotions? ' This led to one thing after another, an...

Reddit - Machine Learning · 1 min · about 9 hours ago

[2603.29759] TSHA: A Benchmark for Visual Language Models in Trustworthy Safety Hazard Assessment Scenarios

About this article

Related Articles

Gemma 4 actually running usable on an Android phone (not llama.cpp)

Claude vs Gemini: Solving the laden knight's tour problem

AI helped me build a custom PC and 4 apps in 6 months with zero coding experience

I thought of something while cooking up a simple RL AI. Please Validate it. [R]

No comments

Stay updated with AI News