[2507.12553] Is This Just Fantasy? Language Model Representations Reflect Human Judgments of Event Plausibility
Summary
This paper explores how language models (LMs) categorize event plausibility, revealing that LMs can reliably discern modal categories, which aligns with human judgment.
Why It Matters
Understanding how language models interpret and categorize events is crucial for improving their reliability in applications such as natural language processing and AI storytelling. This research bridges the gap between AI capabilities and human cognitive processes, potentially enhancing AI's interpretative accuracy.
Key Takeaways
- Language models can effectively categorize sentences based on modality.
- Modal difference vectors emerge consistently as models improve.
- LM representations correlate with human judgments of event plausibility.
- The study enhances understanding of human-like categorization in AI.
- Mechanistic interpretability techniques provide new insights into LM behavior.
Computer Science > Computation and Language arXiv:2507.12553 (cs) [Submitted on 16 Jul 2025 (v1), last revised 25 Feb 2026 (this version, v2)] Title:Is This Just Fantasy? Language Model Representations Reflect Human Judgments of Event Plausibility Authors:Michael A. Lepori, Jennifer Hu, Ishita Dasgupta, Roma Patel, Thomas Serre, Ellie Pavlick View a PDF of the paper titled Is This Just Fantasy? Language Model Representations Reflect Human Judgments of Event Plausibility, by Michael A. Lepori and 5 other authors View PDF Abstract:Language models (LMs) are used for a diverse range of tasks, from question answering to writing fantastical stories. In order to reliably accomplish these tasks, LMs must be able to discern the modal category of a sentence (i.e., whether it describes something that is possible, impossible, completely nonsensical, etc.). However, recent studies have called into question the ability of LMs to categorize sentences according to modality (Michaelov et al., 2025; Kauf et al., 2023). In this work, we identify linear representations that discriminate between modal categories within a variety of LMs, or modal difference vectors. Analysis of modal difference vectors reveals that LMs have access to more reliable modal categorization judgments than previously reported. Furthermore, we find that modal difference vectors emerge in a consistent order as models become more competent (i.e., through training steps, layers, and parameter count). Notably, we find that...