[2604.13068] Before the First Token: Scale-Dependent Emergence of Hallucination Signals in Autoregressive Language Models
About this article
Abstract page for arXiv paper 2604.13068: Before the First Token: Scale-Dependent Emergence of Hallucination Signals in Autoregressive Language Models
Computer Science > Computation and Language arXiv:2604.13068 (cs) [Submitted on 20 Mar 2026] Title:Before the First Token: Scale-Dependent Emergence of Hallucination Signals in Autoregressive Language Models Authors:Dip Roy, Rajiv Misra, Sanjay Kumar Singh, Anisha Roy View a PDF of the paper titled Before the First Token: Scale-Dependent Emergence of Hallucination Signals in Autoregressive Language Models, by Dip Roy and 3 other authors View PDF Abstract:When do large language models decide to hallucinate? Despite serious consequences in healthcare, law, and finance, few formal answers exist. Recent work shows autoregressive models maintain internal representations distinguishing factual from fictional outputs, but when these representations peak as a function of model scale remains poorly understood. We study the temporal dynamics of hallucination-indicative internal representations across 7 autoregressive transformers (117M--7B parameters) using three fact-based datasets (TriviaQA, Simple Facts, Biography; 552 labeled examples). We identify a scale-dependent phase transition: models below 400M parameters show chance-level probe accuracy at every generation position (AUC = 0.48--0.67), indicating no reliable factuality signal. Above $\sim$1B parameters, a qualitatively different regime emerges where peak detectability occurs at position zero -- before any tokens are generated -- then declines during generation. This pre-generation signal is statistically significant in bo...