[2602.20068] The Invisible Gorilla Effect in Out-of-distribution Detection
Summary
The paper explores the 'Invisible Gorilla Effect' in out-of-distribution (OOD) detection, revealing that detection performance varies based on visual similarity between artefacts and regions of interest in images.
Why It Matters
Understanding the biases in OOD detection is crucial for developing more reliable AI systems. This research highlights a significant failure mode that can lead to misclassifications, impacting applications in critical areas such as medical imaging and autonomous systems.
Key Takeaways
- The 'Invisible Gorilla Effect' describes how detection performance improves with visual similarity to the model's ROI.
- Detection methods show significant performance drops when artefacts differ from the ROI in color.
- The study evaluated 40 OOD methods across 7 benchmarks, revealing an overlooked failure mode.
- Annotated artefacts in 11,355 images to substantiate findings.
- Findings provide guidance for developing more robust OOD detection systems.
Computer Science > Computer Vision and Pattern Recognition arXiv:2602.20068 (cs) [Submitted on 23 Feb 2026] Title:The Invisible Gorilla Effect in Out-of-distribution Detection Authors:Harry Anthony, Ziyun Liang, Hermione Warr, Konstantinos Kamnitsas View a PDF of the paper titled The Invisible Gorilla Effect in Out-of-distribution Detection, by Harry Anthony and 3 other authors View PDF HTML (experimental) Abstract:Deep Neural Networks achieve high performance in vision tasks by learning features from regions of interest (ROI) within images, but their performance degrades when deployed on out-of-distribution (OOD) data that differs from training data. This challenge has led to OOD detection methods that aim to identify and reject unreliable predictions. Although prior work shows that OOD detection performance varies by artefact type, the underlying causes remain underexplored. To this end, we identify a previously unreported bias in OOD detection: for hard-to-detect artefacts (near-OOD), detection performance typically improves when the artefact shares visual similarity (e.g. colour) with the model's ROI and drops when it does not - a phenomenon we term the Invisible Gorilla Effect. For example, in a skin lesion classifier with red lesion ROI, we show the method Mahalanobis Score achieves a 31.5% higher AUROC when detecting OOD red ink (similar to ROI) compared to black ink (dissimilar) annotations. We annotated artefacts by colour in 11,355 images from three public datase...