[2602.18600] MapTab: Can MLLMs Master Constrained Route Planning?
Summary
The paper introduces MapTab, a benchmark for evaluating Multimodal Large Language Models (MLLMs) on constrained route planning tasks, highlighting their current limitations in reasoning capabilities.
Why It Matters
As MLLMs are increasingly utilized in various applications, understanding their constrained reasoning abilities is crucial for advancing AI technologies. MapTab addresses the gap in existing benchmarks by providing a realistic testbed for evaluating MLLMs, which is essential for the development of more capable AI systems.
Key Takeaways
- MapTab is designed to assess MLLMs' constrained reasoning in route planning.
- The benchmark includes 328 images and nearly 200,000 route planning queries.
- Current MLLMs struggle with multimodal reasoning, especially under limited visual perception.
- MapTab features scenarios like Metromap and Travelmap to evaluate diverse contexts.
- The findings indicate a need for improved MLLM capabilities in constrained reasoning tasks.
Computer Science > Machine Learning arXiv:2602.18600 (cs) [Submitted on 20 Feb 2026] Title:MapTab: Can MLLMs Master Constrained Route Planning? Authors:Ziqiao Shang, Lingyue Ge, Yang Chen, Shi-Yu Tian, Zhenyu Huang, Wenbo Fu, Yu-Feng Li, Lan-Zhe Guo View a PDF of the paper titled MapTab: Can MLLMs Master Constrained Route Planning?, by Ziqiao Shang and 7 other authors View PDF Abstract:Systematic evaluation of Multimodal Large Language Models (MLLMs) is crucial for advancing Artificial General Intelligence (AGI). However, existing benchmarks remain insufficient for rigorously assessing their constrained reasoning capabilities. To bridge this gap, we introduce MapTab, a multimodal benchmark specifically designed to evaluate constrained reasoning in MLLMs via route planning tasks. MapTab requires MLLMs to perceive and ground visual cues from map images alongside route attributes (e.g., Time, Price) from structured tabular data. The benchmark encompasses two scenarios: Metromap, covering metro networks in 160 cities across 52 countries, and Travelmap, depicting 168 representative tourist attractions from 19 countries. In total, MapTab comprises 328 images, 196,800 route planning queries, and 3,936 QA queries, all incorporating 4 key constraints: Time, Price, Comfort, and Reliability. Extensive evaluations across 15 representative MLLMs reveal that current models face substantial challenges in constrained multimodal reasoning. Notably, under conditions of limited visual percep...