[2511.09833] ACT as Human: Multimodal Large Language Model Data Annotation with Critical Thinking
About this article
Abstract page for arXiv paper 2511.09833: ACT as Human: Multimodal Large Language Model Data Annotation with Critical Thinking
Computer Science > Machine Learning arXiv:2511.09833 (cs) [Submitted on 13 Nov 2025 (v1), last revised 20 Mar 2026 (this version, v2)] Title:ACT as Human: Multimodal Large Language Model Data Annotation with Critical Thinking Authors:Lequan Lin, Dai Shi, Andi Han, Feng Chen, Qiuzheng Chen, Jiawen Li, Zhaoyang Li, Jiyuan Li, Zhenbang Sun, Junbin Gao View a PDF of the paper titled ACT as Human: Multimodal Large Language Model Data Annotation with Critical Thinking, by Lequan Lin and 9 other authors View PDF HTML (experimental) Abstract:Supervised learning relies on high-quality labeled data, but obtaining such data through human annotation is both expensive and time-consuming. Recent work explores using large language models (LLMs) for annotation, but LLM-generated labels still fall short of human-level quality. To address this problem, we propose the Annotation with Critical Thinking (ACT) data pipeline, where LLMs serve not only as annotators but also as judges to critically identify potential errors. Human effort is then directed towards reviewing only the most "suspicious" cases, significantly improving the human annotation efficiency. Our major contributions are as follows: (1) ACT is applicable to a wide range of domains, including natural language processing (NLP), computer vision (CV), and multimodal understanding, by leveraging multimodal-LLMs (MLLMs). (2) Through empirical studies, we derive 7 insights on how to enhance annotation quality while efficiently reducing...