[2604.13275] Better and Worse with Scale: How Contextual Entrainment Diverges with Model Size

[2604.13275] Better and Worse with Scale: How Contextual Entrainment Diverges with Model Size

arXiv - Machine Learning 4 min read

About this article

Abstract page for arXiv paper 2604.13275: Better and Worse with Scale: How Contextual Entrainment Diverges with Model Size

Computer Science > Computation and Language arXiv:2604.13275 (cs) [Submitted on 14 Apr 2026] Title:Better and Worse with Scale: How Contextual Entrainment Diverges with Model Size Authors:Dikshant Kukreja (1), Kshitij Sah (1), Gautam Gupta (1), Avinash Anand (4), Rajiv Ratn Shah (1), Zhengkui Wang (4), Aik Beng Ng (3), Erik Cambria (2) ((1) IIIT Delhi, India, (2) Nanyang Technological University, (3) NVIDIA, (4) Singapore Institute of Technology) View a PDF of the paper titled Better and Worse with Scale: How Contextual Entrainment Diverges with Model Size, by Dikshant Kukreja (1) and 11 other authors View PDF HTML (experimental) Abstract:Larger language models become simultaneously better and worse at handling contextual information -- better at ignoring false claims, worse at ignoring irrelevant tokens. We formalize this apparent paradox through the first scaling laws for contextual entrainment, the tendency of models to favor tokens that appeared in context regardless of relevance. Analyzing the Cerebras-GPT (111M-13B) and Pythia (410M-12B) model families, we find entrainment follows predictable power-law scaling, but with opposite trends depending on context type: semantic contexts show decreasing entrainment with scale, while non-semantic contexts show increasing entrainment. Concretely, the largest models are four times more resistant to counterfactual misinformation than the smallest, yet simultaneously twice as prone to copying arbitrary tokens. These diverging tre...

Originally published on April 16, 2026. Curated by AI News.

Related Articles

Treating enterprise AI as an operating layer | MIT Technology Review
Llms

Treating enterprise AI as an operating layer | MIT Technology Review

There’s a fault line running through enterprise AI, and it’s not the one getting the most attention. The public conversation still tracks...

MIT Technology Review - AI · 7 min ·
Llms

emotion in llms

you know most human emotion is constructed, inferred, there is no root object, you can kind of create the emotion you want? well, i was l...

Reddit - Artificial Intelligence · 1 min ·
Making AI operational in constrained public sector environments | MIT Technology Review
Llms

Making AI operational in constrained public sector environments | MIT Technology Review

The AI boom has hit across industries, and public sector organizations are facing pressure to accelerate adoption. At the same time, gove...

MIT Technology Review - AI · 8 min ·
[2510.19268] Hierarchical DLO Routing with Reinforcement Learning and In-Context Vision-language Models
Llms

[2510.19268] Hierarchical DLO Routing with Reinforcement Learning and In-Context Vision-language Models

Abstract page for arXiv paper 2510.19268: Hierarchical DLO Routing with Reinforcement Learning and In-Context Vision-language Models

arXiv - Machine Learning · 4 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime