[2602.14834] Debiasing Central Fixation Confounds Reveals a Peripheral "Sweet Spot" for Human-like Scanpaths in Hard-Attention Vision
Summary
This paper explores the impact of central fixation bias on evaluating human-like scanpaths in vision models, proposing a new metric to improve accuracy in assessments.
Why It Matters
Understanding how central fixation confounds can mislead evaluations of visual attention models is crucial for advancing computer vision. The proposed Gaze Consistency Score (GCS) offers a refined approach to assess model performance, enhancing the design of gaze benchmarks and improving alignment with human visual behavior.
Key Takeaways
- Central fixation bias can skew evaluations of vision models.
- The Gaze Consistency Score (GCS) provides a center-debiased metric for assessing scanpaths.
- A 'sweet spot' exists for sensory constraints that aligns model behavior with human-like scanpaths.
- Standard metrics may misrepresent model performance due to optimistic biases.
- The findings have implications for designing better gaze benchmarks in AI.
Computer Science > Computer Vision and Pattern Recognition arXiv:2602.14834 (cs) [Submitted on 16 Feb 2026] Title:Debiasing Central Fixation Confounds Reveals a Peripheral "Sweet Spot" for Human-like Scanpaths in Hard-Attention Vision Authors:Pengcheng Pan, Yonekura Shogo, Yasuo Kuniyosh View a PDF of the paper titled Debiasing Central Fixation Confounds Reveals a Peripheral "Sweet Spot" for Human-like Scanpaths in Hard-Attention Vision, by Pengcheng Pan and 2 other authors View PDF HTML (experimental) Abstract:Human eye movements in visual recognition reflect a balance between foveal sampling and peripheral context. Task-driven hard-attention models for vision are often evaluated by how well their scanpaths match human gaze. However, common scanpath metrics can be strongly confounded by dataset-specific center bias, especially on object-centric datasets. Using Gaze-CIFAR-10, we show that a trivial center-fixation baseline achieves surprisingly strong scanpath scores, approaching many learned policies. This makes standard metrics optimistic and blurs the distinction between genuine behavioral alignment and mere central tendency. We then analyze a hard-attention classifier under constrained vision by sweeping foveal patch size and peripheral context, revealing a peripheral sweet spot: only a narrow range of sensory constraints yields scanpaths that are simultaneously (i) above the center baseline after debiasing and (ii) temporally human-like in movement statistics. To addr...