[2602.12892] RADAR: Revealing Asymmetric Development of Abilities in MLLM Pre-training
Summary
The paper presents RADAR, a novel evaluation framework for Multi-modal Large Language Models (MLLMs) that addresses performance bottlenecks by measuring perception and reasoning abilities without the need for fine-tuning.
Why It Matters
As MLLMs become increasingly integral to AI applications, understanding their capabilities and limitations is crucial. RADAR offers a systematic approach to evaluate these models, potentially guiding improvements in AI performance and efficiency.
Key Takeaways
- RADAR introduces a Soft Discrimination Score to evaluate model abilities without fine-tuning.
- The framework includes a Multi-Modal Mixture Benchmark with over 15,000 samples for comprehensive assessment.
- RADAR reveals asymmetric development in MLLM capabilities, highlighting the need for targeted improvements.
Computer Science > Computer Vision and Pattern Recognition arXiv:2602.12892 (cs) [Submitted on 13 Feb 2026] Title:RADAR: Revealing Asymmetric Development of Abilities in MLLM Pre-training Authors:Yunshuang Nie, Bingqian Lin, Minzhe Niu, Kun Xiang, Jianhua Han, Guowei Huang, Xingyue Quan, Hang Xu, Bokui Chen, Xiaodan Liang View a PDF of the paper titled RADAR: Revealing Asymmetric Development of Abilities in MLLM Pre-training, by Yunshuang Nie and 9 other authors View PDF HTML (experimental) Abstract:Pre-trained Multi-modal Large Language Models (MLLMs) provide a knowledge-rich foundation for post-training by leveraging their inherent perception and reasoning capabilities to solve complex tasks. However, the lack of an efficient evaluation framework impedes the diagnosis of their performance bottlenecks. Current evaluation primarily relies on testing after supervised fine-tuning, which introduces laborious additional training and autoregressive decoding costs. Meanwhile, common pre-training metrics cannot quantify a model's perception and reasoning abilities in a disentangled manner. Furthermore, existing evaluation benchmarks are typically limited in scale or misaligned with pre-training objectives. Thus, we propose RADAR, an efficient ability-centric evaluation framework for Revealing Asymmetric Development of Abilities in MLLM pRe-training. RADAR involves two key components: (1) Soft Discrimination Score, a novel metric for robustly tracking ability development without f...