[2512.18951] Benchmarking Attribute Discrimination in Infant-Scale Vision-Language Models
About this article
Abstract page for arXiv paper 2512.18951: Benchmarking Attribute Discrimination in Infant-Scale Vision-Language Models
Computer Science > Machine Learning arXiv:2512.18951 (cs) [Submitted on 22 Dec 2025 (v1), last revised 26 Mar 2026 (this version, v2)] Title:Benchmarking Attribute Discrimination in Infant-Scale Vision-Language Models Authors:Patrick Batsell, Tsutsui Satoshi, Bihan Wen View a PDF of the paper titled Benchmarking Attribute Discrimination in Infant-Scale Vision-Language Models, by Patrick Batsell and 2 other authors View PDF HTML (experimental) Abstract:Infants learn not only object categories but also fine-grained visual attributes such as color, size, and texture from limited experience. Prior infant-scale vision--language models have mainly been evaluated on object recognition, leaving open whether they support within-class attribute discrimination. We introduce a controlled benchmark that varies color, size, and texture across 67 everyday object classes using synthetic rendering to decouple attribute values from object identity. We evaluate infant-trained models (CVCL and an infant-trained DINO baseline) against web-scale and ImageNet models (CLIP, SigLIP, ResNeXt) under two complementary settings: an image-only prototype test and a text--vision test with attribute--object prompts. We find a dissociation between visual and linguistic attribute information: infant-trained models form strong visual representations for size and texture but perform poorly on visual color discrimination, and in the text--vision setting they struggle to ground color and show only modest size g...