[2604.06422] When to Call an Apple Red: Humans Follow Introspective Rules, VLMs Don't
About this article
Abstract page for arXiv paper 2604.06422: When to Call an Apple Red: Humans Follow Introspective Rules, VLMs Don't
Computer Science > Computation and Language arXiv:2604.06422 (cs) [Submitted on 7 Apr 2026] Title:When to Call an Apple Red: Humans Follow Introspective Rules, VLMs Don't Authors:Jonathan Nemitz, Carsten Eickhoff, Junyi Jessy Li, Kyle Mahowald, Michal Golovanevsky, William Rudman View a PDF of the paper titled When to Call an Apple Red: Humans Follow Introspective Rules, VLMs Don't, by Jonathan Nemitz and 5 other authors View PDF HTML (experimental) Abstract:Understanding when Vision-Language Models (VLMs) will behave unexpectedly, whether models can reliably predict their own behavior, and if models adhere to their introspective reasoning are central challenges for trustworthy deployment. To study this, we introduce the Graded Color Attribution (GCA) dataset, a controlled benchmark designed to elicit decision rules and evaluate participant faithfulness to these rules. GCA consists of line drawings that vary pixel-level color coverage across three conditions: world-knowledge recolorings, counterfactual recolorings, and shapes with no color priors. Using GCA, both VLMs and human participants establish a threshold: the minimum percentage of pixels of a given color an object must have to receive that color label. We then compare these rules with their subsequent color attribution decisions. Our findings reveal that models systematically violate their own introspective rules. For example, GPT-5-mini violates its stated introspection rules in nearly 60\% of cases on objects wit...