[2604.13275] Better and Worse with Scale: How Contextual Entrainment Diverges with Model Size
About this article
Abstract page for arXiv paper 2604.13275: Better and Worse with Scale: How Contextual Entrainment Diverges with Model Size
Computer Science > Computation and Language arXiv:2604.13275 (cs) [Submitted on 14 Apr 2026] Title:Better and Worse with Scale: How Contextual Entrainment Diverges with Model Size Authors:Dikshant Kukreja (1), Kshitij Sah (1), Gautam Gupta (1), Avinash Anand (4), Rajiv Ratn Shah (1), Zhengkui Wang (4), Aik Beng Ng (3), Erik Cambria (2) ((1) IIIT Delhi, India, (2) Nanyang Technological University, (3) NVIDIA, (4) Singapore Institute of Technology) View a PDF of the paper titled Better and Worse with Scale: How Contextual Entrainment Diverges with Model Size, by Dikshant Kukreja (1) and 11 other authors View PDF HTML (experimental) Abstract:Larger language models become simultaneously better and worse at handling contextual information -- better at ignoring false claims, worse at ignoring irrelevant tokens. We formalize this apparent paradox through the first scaling laws for contextual entrainment, the tendency of models to favor tokens that appeared in context regardless of relevance. Analyzing the Cerebras-GPT (111M-13B) and Pythia (410M-12B) model families, we find entrainment follows predictable power-law scaling, but with opposite trends depending on context type: semantic contexts show decreasing entrainment with scale, while non-semantic contexts show increasing entrainment. Concretely, the largest models are four times more resistant to counterfactual misinformation than the smallest, yet simultaneously twice as prone to copying arbitrary tokens. These diverging tre...