[2603.20531] Epistemic Observability in Language Models
About this article
Abstract page for arXiv paper 2603.20531: Epistemic Observability in Language Models
Computer Science > Distributed, Parallel, and Cluster Computing arXiv:2603.20531 (cs) [Submitted on 20 Mar 2026] Title:Epistemic Observability in Language Models Authors:Tony Mason View a PDF of the paper titled Epistemic Observability in Language Models, by Tony Mason View PDF HTML (experimental) Abstract:We find that models report highest confidence precisely when they are fabricating. Across four model families (OLMo-3, Llama-3.1, Qwen3, Mistral), self-reported confidence inversely correlates with accuracy, with AUC ranging from 0.28 to 0.36 where 0.5 is random guessing. We prove, under explicit formal assumptions, that this is not a capability gap but an observational one. Under text-only observation, where a supervisor sees only the model's output text, no monitoring system can reliably distinguish honest model outputs from plausible fabrications. We prove two results: first, that any policy conditioning only on the query cannot satisfy epistemic honesty across ambiguous world states; second, that no learning algorithm optimizing reward from a text-only supervisor can converge to honest behavior when the supervisor's observations are identical for both grounded and fabricated responses. Within our formal model, these impossibilities hold regardless of model scale or training procedure, including RLHF and instruction tuning. We construct a tensor interface that escapes the impossibility by exporting computational byproducts (per-token entropy and log-probability distri...