[2603.11749] Truth as a Compression Artifact in Language Model Training
About this article
Abstract page for arXiv paper 2603.11749: Truth as a Compression Artifact in Language Model Training
Computer Science > Computation and Language arXiv:2603.11749 (cs) [Submitted on 12 Mar 2026 (v1), last revised 3 Apr 2026 (this version, v3)] Title:Truth as a Compression Artifact in Language Model Training Authors:Konstantin Krestnikov View a PDF of the paper titled Truth as a Compression Artifact in Language Model Training, by Konstantin Krestnikov View PDF HTML (experimental) Abstract:Why do language models trained on contradictory data prefer correct answers? In controlled experiments with small transformers (3.5M--86M parameters), we show that this preference tracks the compressibility structure of errors rather than truth per se. We train GPT-2 style models on corpora where each mathematical problem appears with both correct and incorrect solutions -- a denoising design that directly models conflicting information about the same fact. When errors are random, models extract the correct signal with accuracy scaling from 65% to 85% with model size. When errors follow a coherent alternative rule system, accuracy drops to chance (~45--51%): the model cannot distinguish the false system from truth. A multi-rule experiment reveals a sharp crossover: a single coherent alternative rule eliminates truth bias entirely, but adding a second competing rule restores most of it (47%->78%), with continued growth through N=10 (88%). The same pattern reproduces on real Wikipedia text (71% vs 46%). We propose the Compression--Consistency Principle as an explanatory hypothesis: in these ...