Llms Machine Learning Generative Ai Nlp Ai Safety

[2506.03922] HSSBench: Benchmarking Humanities and Social Sciences Ability for Multimodal Large Language Models

arXiv - AI February 25, 2026 4 min read Article

Summary

HSSBench introduces a benchmark for evaluating Multimodal Large Language Models (MLLMs) in Humanities and Social Sciences, addressing gaps in current assessments focused on STEM disciplines.

Why It Matters

This benchmark is crucial as it highlights the unique challenges faced by MLLMs in the Humanities and Social Sciences, promoting interdisciplinary thinking and enhancing the models' ability to connect abstract concepts with visual representations. By expanding evaluation criteria, it encourages further research and development in these fields.

Key Takeaways

HSSBench is designed to assess MLLMs' capabilities in Humanities and Social Sciences.
The benchmark includes over 13,000 samples across six categories, emphasizing interdisciplinary knowledge.
Current MLLMs face significant challenges when evaluated against HSSBench.
A novel data generation pipeline involving domain experts enhances sample quality.
The benchmark aims to inspire research into improving cross-disciplinary reasoning in MLLMs.

Computer Science > Computation and Language arXiv:2506.03922 (cs) [Submitted on 4 Jun 2025 (v1), last revised 24 Feb 2026 (this version, v2)] Title:HSSBench: Benchmarking Humanities and Social Sciences Ability for Multimodal Large Language Models Authors:Zhaolu Kang, Junhao Gong, Jiaxu Yan, Wanke Xia, Yian Wang, Ziwen Wang, Huaxuan Ding, Zhuo Cheng, Wenhao Cao, Zhiyuan Feng, Siqi He, Shannan Yan, Junzhe Chen, Xiaomin He, Chaoya Jiang, Wei Ye, Kaidong Yu, Xuelong Li View a PDF of the paper titled HSSBench: Benchmarking Humanities and Social Sciences Ability for Multimodal Large Language Models, by Zhaolu Kang and 17 other authors View PDF HTML (experimental) Abstract:Multimodal Large Language Models (MLLMs) have demonstrated significant potential to advance a broad range of domains. However, current benchmarks for evaluating MLLMs primarily emphasize general knowledge and vertical step-by-step reasoning typical of STEM disciplines, while overlooking the distinct needs and potential of the Humanities and Social Sciences (HSS). Tasks in the HSS domain require more horizontal, interdisciplinary thinking and a deep integration of knowledge across related fields, which presents unique challenges for MLLMs, particularly in linking abstract concepts with corresponding visual representations. Addressing this gap, we present HSSBench, a dedicated benchmark designed to assess the capabilities of MLLMs on HSS tasks in multiple languages, including the six official languages of the Uni...

Read Original Article

[2506.03922] HSSBench: Benchmarking Humanities and Social Sciences Ability for Multimodal Large Language Models

Summary

Why It Matters

Key Takeaways

Related Articles

"Oops! ChatGPT is Temporarily Unavailable!": A Diary Study on Knowledge Workers' Experiences of LLM Withdrawal

I built a Star Trek LCARS terminal that reads your entire AI coding setup

[R] Is autoresearch really better than classic hyperparameter tuning?

Claude Source Code?

No comments

Stay updated with AI News