AI & ML Nature Is Weird

AI models can actually get 'brain fog' where their old thoughts clutter up their heads so much they forget how to think straight.

April 13, 2026

Original Paper

Robust Reasoning Benchmark

Pavel Golikov, Evgenii Opryshko, Gennady Pekhimenko, Mark C. Jeffrey

arXiv · 2604.08571

The Takeaway

This paper shows that intermediate reasoning steps act like cognitive clutter that degrades performance over time in a single conversation. It suggests that more 'thinking' isn't always better; it can actually confuse the model by filling its memory with distracting noise.

From the abstract

While Large Language Models (LLMs) achieve high performance on standard mathematical benchmarks, their underlying reasoning processes remain highly overfit to standard textual formatting. We propose a perturbation pipeline consisting of 14 techniques to evaluate robustness of LLM reasoning. We apply this pipeline to AIME 2024 dataset and evalute 8 state-of-the-art models on the resulting benchmark. While frontier models exhibit resilience, open weights reasoning models suffer catastrophic collap

Read the original paper →

← Back to today's papers