[2510.26865] Do Vision-Language Models Measure Up? Benchmarking Visual Measurement Reading with MeasureBench
About this article
Abstract page for arXiv paper 2510.26865: Do Vision-Language Models Measure Up? Benchmarking Visual Measurement Reading with MeasureBench
Computer Science > Computer Vision and Pattern Recognition arXiv:2510.26865 (cs) [Submitted on 30 Oct 2025 (v1), last revised 24 Mar 2026 (this version, v2)] Title:Do Vision-Language Models Measure Up? Benchmarking Visual Measurement Reading with MeasureBench Authors:Fenfen Lin, Yesheng Liu, Haiyu Xu, Chen Yue, Zheqi He, Mingxuan Zhao, Miguel Hu Chen, Jiakang Liu, JG Yao, Xi Yang View a PDF of the paper titled Do Vision-Language Models Measure Up? Benchmarking Visual Measurement Reading with MeasureBench, by Fenfen Lin and 9 other authors View PDF HTML (experimental) Abstract:Reading measurement instruments is effortless for humans and requires relatively little domain expertise, yet it remains surprisingly challenging for current vision-language models (VLMs) as we find in preliminary evaluation. In this work, we introduce MeasureBench, a benchmark on visual measurement reading covering both real-world and synthesized images of various types of measurements, along with an extensible pipeline for data synthesis. Our pipeline procedurally generates a specified type of gauge with controllable visual appearance, enabling scalable variation in key details such as pointers, scales, fonts, lighting, and clutter. Evaluation on popular proprietary and open-weight VLMs shows that even the strongest frontier VLMs struggle with measurement reading in general. We have also conducted preliminary experiments with reinforcement finetuning (RFT) over synthetic data, and find a significant...