Llms Machine Learning Nlp

[2602.20918] Predicting Sentence Acceptability Judgments in Multimodal Contexts

arXiv - AI February 25, 2026 4 min read Article

Summary

This paper explores how visual context influences sentence acceptability judgments in humans and large language models (LLMs), revealing that visual images have minimal impact on human ratings, while LLMs show varied performance based on context.

Why It Matters

Understanding how multimodal contexts affect sentence acceptability is crucial for advancing natural language processing and improving the design of AI systems. This research highlights the differences in processing between humans and LLMs, which can inform future AI development and applications in language understanding.

Key Takeaways

Visual context has little impact on human sentence acceptability judgments.
LLMs can predict human acceptability judgments with high accuracy, especially without visual context.
Different LLMs exhibit varying patterns in sentence acceptability, with some closely resembling human judgments.
The presence of visual contexts decreases the correlation between LLM predictions and their internal representations.
This study provides insights into the processing differences between humans and LLMs in multimodal contexts.

Computer Science > Artificial Intelligence arXiv:2602.20918 (cs) [Submitted on 24 Feb 2026] Title:Predicting Sentence Acceptability Judgments in Multimodal Contexts Authors:Hyewon Jang, Nikolai Ilinykh, Sharid Loáiciga, Jey Han Lau, Shalom Lappin View a PDF of the paper titled Predicting Sentence Acceptability Judgments in Multimodal Contexts, by Hyewon Jang and 4 other authors View PDF HTML (experimental) Abstract:Previous work has examined the capacity of deep neural networks (DNNs), particularly transformers, to predict human sentence acceptability judgments, both independently of context, and in document contexts. We consider the effect of prior exposure to visual images (i.e., visual context) on these judgments for humans and large language models (LLMs). Our results suggest that, in contrast to textual context, visual images appear to have little if any impact on human acceptability ratings. However, LLMs display the compression effect seen in previous work on human judgments in document contexts. Different sorts of LLMs are able to predict human acceptability judgments to a high degree of accuracy, but in general, their performance is slightly better when visual contexts are removed. Moreover, the distribution of LLM judgments varies among models, with Qwen resembling human patterns, and others diverging from them. LLM-generated predictions on sentence acceptability are highly correlated with their normalised log probabilities in general. However, the correlations d...

Read Original Article

[2602.20918] Predicting Sentence Acceptability Judgments in Multimodal Contexts

Summary

Why It Matters

Key Takeaways

Related Articles

[2603.29957] Think Anywhere in Code Generation

[2603.16880] NeuroNarrator: A Generalist EEG-to-Text Foundation Model for Clinical Interpretation via Spectro-Spatial Grounding and Temporal State-Space Reasoning

[2512.21106] Semantic Refinement with LLMs for Graph Representations

[2511.18123] Bias Is a Subspace, Not a Coordinate: A Geometric Rethinking of Post-hoc Debiasing in Vision-Language Models

No comments

Stay updated with AI News