[2507.19634] MCIF: Multimodal Crosslingual Instruction-Following Benchmark from Scientific Talks

[2507.19634] MCIF: Multimodal Crosslingual Instruction-Following Benchmark from Scientific Talks

arXiv - AI 4 min read Article

Summary

The MCIF benchmark introduces a novel framework for evaluating multimodal crosslingual instruction-following capabilities in large language models, addressing existing gaps in current benchmarks.

Why It Matters

As multimodal large language models (MLLMs) evolve, comprehensive evaluation across languages and modalities is crucial for their development. MCIF provides a structured approach to assess these models' capabilities, fostering advancements in AI that can understand and process diverse inputs more effectively.

Key Takeaways

  • MCIF is the first crosslingual benchmark for MLLMs based on scientific talks.
  • It evaluates instruction following across four macro-tasks: recognition, translation, question answering, and summarization.
  • The benchmark covers three modalities (speech, vision, text) and four languages (English, German, Italian, Chinese).
  • Analysis of 23 models reveals common challenges, indicating areas for future improvement.
  • MCIF is released under a CC-BY 4.0 license to promote open research.

Computer Science > Computation and Language arXiv:2507.19634 (cs) [Submitted on 25 Jul 2025 (v1), last revised 19 Feb 2026 (this version, v3)] Title:MCIF: Multimodal Crosslingual Instruction-Following Benchmark from Scientific Talks Authors:Sara Papi, Maike Züfle, Marco Gaido, Beatrice Savoldi, Danni Liu, Ioannis Douros, Luisa Bentivogli, Jan Niehues View a PDF of the paper titled MCIF: Multimodal Crosslingual Instruction-Following Benchmark from Scientific Talks, by Sara Papi and 7 other authors View PDF Abstract:Recent advances in large language models have laid the foundation for multimodal LLMs (MLLMs), which unify text, speech, and vision within a single framework. As these models are rapidly evolving toward general-purpose instruction following across diverse and complex tasks, a key frontier is evaluating their crosslingual and multimodal capabilities over both short- and long-form inputs. However, existing benchmarks fall short in evaluating these dimensions jointly: they are often limited to English, mostly focus on a single modality at a time, rely on short-form inputs, or lack human annotations--hindering comprehensive assessment of model performance across languages, modalities, and task complexity. To address these gaps, we introduce MCIF (Multimodal Crosslingual Instruction Following), the first crosslingual human-annotated benchmark based on scientific talks on NLP and beyond. MCIF evaluates instruction following in crosslingual, multimodal settings over dif...

Related Articles

Llms

World models will be the next big thing, bye-bye LLMs

Was at Nvidia's GTC conference recently and honestly, it was one of the most eye-opening events I've attended in a while. There was a lot...

Reddit - Artificial Intelligence · 1 min ·
Llms

we open sourced a tool that auto generates your AI agent context from your actual codebase, just hit 250 stars

hey everyone. been lurking here for a while and wanted to share something we been building. the problem: ai coding agents are only as goo...

Reddit - Artificial Intelligence · 1 min ·
Llms

I Accidentally Discovered a Security Vulnerability in AI Education — Then Submitted It To a $200K Competition

Last night I was testing Maestro University, the first fully AI-taught university. I walked into their enrollment chatbot and asked it to...

Reddit - Artificial Intelligence · 1 min ·
Llms

Is anyone else concerned with this blatant potential of security / privacy breach?

Recently, when sending a very sensitive email to my brother including my mother’s health information, I wondered what happens if a recipi...

Reddit - Artificial Intelligence · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime