SeriesFusion
Science, curated & edited by AI
Nature Is Weird  /  AI

A tiny window of just 100 training steps determines whether a Transformer learns to reason or simply memorizes its homework.

Neural network training is not a slow, steady climb toward intelligence but a series of sharp, critical tipping points. Models decide whether to develop a general reasoning rule or resort to brute-force memorization within a very specific timeframe. Missing this window by a small margin can result in a model that functions like a lookup table rather than a logical engine. This research proves that the timing of complexity control is just as important as the data or the architecture itself. Engineers might be wasting massive compute resources by failing to nudge models toward reasoning during these fleeting moments.

Original Paper

Critical Windows of Complexity Control: When Transformers Decide to Reason or Memorize

Sarwan Ali

arXiv  ·  2605.04396

Recent work has shown that Transformers' compositional generalization is governed by \emph{complexity control}, initialization scale and weight decay, which steers training toward low-complexity reasoning solutions rather than high-complexity memorization. Existing analyses, however, treat complexity control as a single static hyperparameter choice, leaving open \emph{when} during training this control is actually decisive. We show that the memorization-versus-reasoning fate of a Transformer is