[2506.03922] HSSBench: Benchmarking Humanities and Social Sciences Ability for Multimodal Large Language Models

[2506.03922] HSSBench: Benchmarking Humanities and Social Sciences Ability for Multimodal Large Language Models

arXiv - AI 4 min read Article

Summary

HSSBench introduces a benchmark for evaluating Multimodal Large Language Models (MLLMs) in Humanities and Social Sciences, addressing gaps in current assessments focused on STEM disciplines.

Why It Matters

This benchmark is crucial as it highlights the unique challenges faced by MLLMs in the Humanities and Social Sciences, promoting interdisciplinary thinking and enhancing the models' ability to connect abstract concepts with visual representations. By expanding evaluation criteria, it encourages further research and development in these fields.

Key Takeaways

  • HSSBench is designed to assess MLLMs' capabilities in Humanities and Social Sciences.
  • The benchmark includes over 13,000 samples across six categories, emphasizing interdisciplinary knowledge.
  • Current MLLMs face significant challenges when evaluated against HSSBench.
  • A novel data generation pipeline involving domain experts enhances sample quality.
  • The benchmark aims to inspire research into improving cross-disciplinary reasoning in MLLMs.

Computer Science > Computation and Language arXiv:2506.03922 (cs) [Submitted on 4 Jun 2025 (v1), last revised 24 Feb 2026 (this version, v2)] Title:HSSBench: Benchmarking Humanities and Social Sciences Ability for Multimodal Large Language Models Authors:Zhaolu Kang, Junhao Gong, Jiaxu Yan, Wanke Xia, Yian Wang, Ziwen Wang, Huaxuan Ding, Zhuo Cheng, Wenhao Cao, Zhiyuan Feng, Siqi He, Shannan Yan, Junzhe Chen, Xiaomin He, Chaoya Jiang, Wei Ye, Kaidong Yu, Xuelong Li View a PDF of the paper titled HSSBench: Benchmarking Humanities and Social Sciences Ability for Multimodal Large Language Models, by Zhaolu Kang and 17 other authors View PDF HTML (experimental) Abstract:Multimodal Large Language Models (MLLMs) have demonstrated significant potential to advance a broad range of domains. However, current benchmarks for evaluating MLLMs primarily emphasize general knowledge and vertical step-by-step reasoning typical of STEM disciplines, while overlooking the distinct needs and potential of the Humanities and Social Sciences (HSS). Tasks in the HSS domain require more horizontal, interdisciplinary thinking and a deep integration of knowledge across related fields, which presents unique challenges for MLLMs, particularly in linking abstract concepts with corresponding visual representations. Addressing this gap, we present HSSBench, a dedicated benchmark designed to assess the capabilities of MLLMs on HSS tasks in multiple languages, including the six official languages of the Uni...

Related Articles

Llms

"Oops! ChatGPT is Temporarily Unavailable!": A Diary Study on Knowledge Workers' Experiences of LLM Withdrawal

submitted by /u/Special-Steel [link] [comments]

Reddit - Artificial Intelligence · 1 min ·
Llms

I built a Star Trek LCARS terminal that reads your entire AI coding setup

Side project that got out of hand. It's a dashboard for Claude Code that scans your ~/.claude/ directory and renders everything as a TNG ...

Reddit - Artificial Intelligence · 1 min ·
Llms

[R] Is autoresearch really better than classic hyperparameter tuning?

We did experiments comparing Optuna & autoresearch. Autoresearch converges faster, is more cost-efficient, and even generalizes bette...

Reddit - Machine Learning · 1 min ·
Llms

Claude Source Code?

Has anyone been able to successfully download the leaked source code yet? I've not been able to find it. If anyone has, please reach out....

Reddit - Artificial Intelligence · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime