[2603.04419] Context-Dependent Affordance Computation in Vision-Language Models
About this article
Abstract page for arXiv paper 2603.04419: Context-Dependent Affordance Computation in Vision-Language Models
Computer Science > Computation and Language arXiv:2603.04419 (cs) [Submitted on 14 Feb 2026] Title:Context-Dependent Affordance Computation in Vision-Language Models Authors:Murad Farzulla View a PDF of the paper titled Context-Dependent Affordance Computation in Vision-Language Models, by Murad Farzulla View PDF HTML (experimental) Abstract:We characterize the phenomenon of context-dependent affordance computation in vision-language models (VLMs). Through a large-scale computational study (n=3,213 scene-context pairs from COCO-2017) using Qwen-VL 30B and LLaVA-1.5-13B subject to systematic context priming across 7 agentic personas, we demonstrate massive affordance drift: mean Jaccard similarity between context conditions is 0.095 (95% CI: [0.093, 0.096], p < 0.0001), indicating that >90% of lexical scene description is context-dependent. Sentence-level cosine similarity confirms substantial drift at the semantic level (mean = 0.415, 58.5% context-dependent). Stochastic baseline experiments (2,384 inference runs across 4 temperatures and 5 seeds) confirm this drift reflects genuine context effects rather than generation noise: within-prime variance is substantially lower than cross-prime variance across all conditions. Tucker decomposition with bootstrap stability analysis (n=1,000 resamples) reveals stable orthogonal latent factors: a "Culinary Manifold" isolated to chef contexts and an "Access Axis" spanning child-mobility contrasts. These findings establish that VLMs c...