AI & ML Nature Is Weird

Making models larger actually makes them worse at ignoring irrelevant junk text.

April 16, 2026

Original Paper

Better and Worse with Scale: How Contextual Entrainment Diverges with Model Size

Dikshant Kukreja, Kshitij Sah, Gautam Gupta, Avinash Anand, Rajiv Ratn Shah, Zhengkui Wang, Aik Beng Ng, Erik Cambria

arXiv · 2604.13275

The Takeaway

The research reveals a scaling paradox: while bigger models get better at spotting false facts, they simultaneously become more prone to 'mindless copying' of irrelevant non-semantic tokens. This divergence in 'contextual entrainment' means larger models are more easily distracted by garbage in the prompt. Previously, the assumption was that scaling laws improved all aspects of reasoning. This paper shows that 'smart' models are uniquely vulnerable to being derailed by noisy context. This is critical for RAG practitioners who need to realize that better models might actually require *cleaner* context, not less. It challenges the 'just throw it all in the prompt' philosophy of large-context windows.

From the abstract

Larger language models become simultaneously better and worse at handling contextual information -- better at ignoring false claims, worse at ignoring irrelevant tokens. We formalize this apparent paradox through the first scaling laws for contextual entrainment, the tendency of models to favor tokens that appeared in context regardless of relevance. Analyzing the Cerebras-GPT (111M-13B) and Pythia (410M-12B) model families, we find entrainment follows predictable power-law scaling, but with opp

Read the original paper →

← Back to today's papers