[2510.15148] XModBench: Benchmarking Cross-Modal Capabilities and

[2510.15148] XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models

arXiv - AI April 07, 2026 4 min read

About this article

Abstract page for arXiv paper 2510.15148: XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models

Computer Science > Computer Vision and Pattern Recognition arXiv:2510.15148 (cs) [Submitted on 16 Oct 2025 (v1), last revised 4 Apr 2026 (this version, v2)] Title:XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models Authors:Xingrui Wang, Jiang Liu, Chao Huang, Xiaodong Yu, Ze Wang, Ximeng Sun, Jialian Wu, Alan Yuille, Emad Barsoum, Zicheng Liu View a PDF of the paper titled XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models, by Xingrui Wang and 9 other authors View PDF HTML (experimental) Abstract:Omni-modal large language models (OLLMs) aim to unify audio, vision, and text understanding within a single framework. While existing benchmarks primarily evaluate general cross-modal question-answering ability, it remains unclear whether OLLMs achieve modality-invariant reasoning or exhibit modality-specific biases. We introduce XModBench, a large-scale tri-modal benchmark explicitly designed to measure cross-modal consistency. XModBench comprises 60,828 multiple-choice questions spanning five task families and systematically covers all six modality compositions in question-answer pairs, enabling fine-grained diagnosis of an OLLM's modality-invariant reasoning, modality disparity, and directional imbalance. Experiments show that even the strongest model, Gemini 2.5 Pro, (i) struggles with spatial and temporal reasoning, achieving less than 60% accuracy, (ii) reveals persistent modality disparities, with pe...

Originally published on April 07, 2026. Curated by AI News.

Llms

[D] thoughts on current community moving away from heavy math?

I don't know about how you guys feel but even before LLM started, many papers are already leaning on empirical findings, architecture des...

Reddit - Machine Learning · 1 min · 13 minutes ago

Llms

Gemini is making it faster for distressed users to reach mental health resources | The Verge

The update follows a wrongful death lawsuit alleging Gemini ‘coached’ a man to die by suicide.

The Verge - AI · 4 min · about 1 hour ago

Llms

Anthropic Claude AI training model targets AI skills gap | ETIH EdTech News

AI in education, edtech AI tools, and AI skills training drive Anthropic’s Claude curriculum. ETIH edtech news covers how AI fluency, wor...

AI Tools & Products · 6 min · about 3 hours ago

Llms

I use ChatGPT every day — I stick to these 3 rules to protect my privacy

I stick to three essential rules whenever I open up a new chat in ChatGPT to always protect my privacy and keep my data secure

AI Tools & Products · 9 min · about 3 hours ago

[2510.15148] XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models

About this article

Related Articles

[D] thoughts on current community moving away from heavy math?

Gemini is making it faster for distressed users to reach mental health resources | The Verge

Anthropic Claude AI training model targets AI skills gap | ETIH EdTech News

I use ChatGPT every day — I stick to these 3 rules to protect my privacy

No comments

Stay updated with AI News