[2604.06422] When to Call an Apple Red: Humans Follow Introspective

[2604.06422] When to Call an Apple Red: Humans Follow Introspective Rules, VLMs Don't

arXiv - AI April 09, 2026 4 min read

About this article

Abstract page for arXiv paper 2604.06422: When to Call an Apple Red: Humans Follow Introspective Rules, VLMs Don't

Computer Science > Computation and Language arXiv:2604.06422 (cs) [Submitted on 7 Apr 2026] Title:When to Call an Apple Red: Humans Follow Introspective Rules, VLMs Don't Authors:Jonathan Nemitz, Carsten Eickhoff, Junyi Jessy Li, Kyle Mahowald, Michal Golovanevsky, William Rudman View a PDF of the paper titled When to Call an Apple Red: Humans Follow Introspective Rules, VLMs Don't, by Jonathan Nemitz and 5 other authors View PDF HTML (experimental) Abstract:Understanding when Vision-Language Models (VLMs) will behave unexpectedly, whether models can reliably predict their own behavior, and if models adhere to their introspective reasoning are central challenges for trustworthy deployment. To study this, we introduce the Graded Color Attribution (GCA) dataset, a controlled benchmark designed to elicit decision rules and evaluate participant faithfulness to these rules. GCA consists of line drawings that vary pixel-level color coverage across three conditions: world-knowledge recolorings, counterfactual recolorings, and shapes with no color priors. Using GCA, both VLMs and human participants establish a threshold: the minimum percentage of pixels of a given color an object must have to receive that color label. We then compare these rules with their subsequent color attribution decisions. Our findings reveal that models systematically violate their own introspective rules. For example, GPT-5-mini violates its stated introspection rules in nearly 60\% of cases on objects wit...

Originally published on April 09, 2026. Curated by AI News.

Llms

OpenAI introduces new 'Trusted Contact' safeguard for cases of possible self-harm | TechCrunch

The company is expanding its efforts to protect ChatGPT users in cases where conversations may turn to self-harm.

TechCrunch - AI · 5 min · 40 minutes ago

Llms

Mira Murati’s deposition pulled back the curtain on Sam Altman’s ouster | The Verge

Thanks to Musk v. Altman, the public is getting a concrete look at details of Sam Altman’s ouster from OpenAI, much of it centered on for...

The Verge - AI · 11 min · about 2 hours ago

Llms

Diffusion for generating/editing ASTs? [D]

I’m not a machine learning expert or anything, but I do enjoy learning about how it all works. I’ve noticed that one of the main limitati...

Reddit - Machine Learning · 1 min · about 3 hours ago

Llms

ChatGPT’s ‘Trusted Contact’ will alert loved ones of safety concerns | The Verge

OpenAI is launching an optional safety feature for ChatGPT that allows adult users to assign an emergency contact for mental health and s...

The Verge - AI · 4 min · about 3 hours ago

[2604.06422] When to Call an Apple Red: Humans Follow Introspective Rules, VLMs Don't

About this article

Related Articles

OpenAI introduces new 'Trusted Contact' safeguard for cases of possible self-harm | TechCrunch

Mira Murati’s deposition pulled back the curtain on Sam Altman’s ouster | The Verge

Diffusion for generating/editing ASTs? [D]

ChatGPT’s ‘Trusted Contact’ will alert loved ones of safety concerns | The Verge

No comments

Stay updated with AI News