[2510.15148] XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models

[2510.15148] XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models

arXiv - AI 4 min read

About this article

Abstract page for arXiv paper 2510.15148: XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models

Computer Science > Computer Vision and Pattern Recognition arXiv:2510.15148 (cs) [Submitted on 16 Oct 2025 (v1), last revised 4 Apr 2026 (this version, v2)] Title:XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models Authors:Xingrui Wang, Jiang Liu, Chao Huang, Xiaodong Yu, Ze Wang, Ximeng Sun, Jialian Wu, Alan Yuille, Emad Barsoum, Zicheng Liu View a PDF of the paper titled XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models, by Xingrui Wang and 9 other authors View PDF HTML (experimental) Abstract:Omni-modal large language models (OLLMs) aim to unify audio, vision, and text understanding within a single framework. While existing benchmarks primarily evaluate general cross-modal question-answering ability, it remains unclear whether OLLMs achieve modality-invariant reasoning or exhibit modality-specific biases. We introduce XModBench, a large-scale tri-modal benchmark explicitly designed to measure cross-modal consistency. XModBench comprises 60,828 multiple-choice questions spanning five task families and systematically covers all six modality compositions in question-answer pairs, enabling fine-grained diagnosis of an OLLM's modality-invariant reasoning, modality disparity, and directional imbalance. Experiments show that even the strongest model, Gemini 2.5 Pro, (i) struggles with spatial and temporal reasoning, achieving less than 60% accuracy, (ii) reveals persistent modality disparities, with pe...

Originally published on April 07, 2026. Curated by AI News.

Related Articles

Llms

[D] thoughts on current community moving away from heavy math?

I don't know about how you guys feel but even before LLM started, many papers are already leaning on empirical findings, architecture des...

Reddit - Machine Learning · 1 min ·
Gemini is making it faster for distressed users to reach mental health resources  | The Verge
Llms

Gemini is making it faster for distressed users to reach mental health resources  | The Verge

The update follows a wrongful death lawsuit alleging Gemini ‘coached’ a man to die by suicide.

The Verge - AI · 4 min ·
Anthropic Claude AI training model targets AI skills gap | ETIH EdTech News
Llms

Anthropic Claude AI training model targets AI skills gap | ETIH EdTech News

AI in education, edtech AI tools, and AI skills training drive Anthropic’s Claude curriculum. ETIH edtech news covers how AI fluency, wor...

AI Tools & Products · 6 min ·
I use ChatGPT every day — I stick to these 3 rules to protect my privacy
Llms

I use ChatGPT every day — I stick to these 3 rules to protect my privacy

I stick to three essential rules whenever I open up a new chat in ChatGPT to always protect my privacy and keep my data secure

AI Tools & Products · 9 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime