AI & ML Nature Is Weird

Transformers fail at symbolic logic because the "unembeddings" for new tokens collapse into a single, identical vector during training.

April 24, 2026

Original Paper

To See the Unseen: on the Generalization Ability of Transformers in Symbolic Reasoning

Nevena Lazić, Liam Fowl, András György, Csaba Szepesvári

arXiv · 2604.21632

The Takeaway

This mechanistic blind spot explains why a model that understands a logical rule can still fail when you swap out the variable names. During the learning process, the weights in the final layer of the network lose the ability to distinguish between symbols they have not seen before. This collapse makes it impossible for the model to apply its reasoning to any new or rare data points. Most people assumed this was a failure of logic, but it is actually a failure of basic symbol recognition in the hardware layer. Fixing this requires a new way to initialize or train the final layers of the transformer. This insight provides a direct target for making AI better at math and programming.

From the abstract

We investigate the ability of decoder-only transformer models to perform abstract symbolic reasoning; specifically solving propositional logic reasoning problems given in-context. Previous work demonstrated that models fail to generalize to problems involving variable names that were not observed during training, and it was shown that one reason behind this is the difficulty of copying (or generating) unseen tokens. We show both theoretically and empirically that a particular representational co