Transformers fail at symbolic logic because the "unembeddings" for new tokens collapse into a single, identical vector during training.
April 24, 2026
Original Paper
To See the Unseen: on the Generalization Ability of Transformers in Symbolic Reasoning
arXiv · 2604.21632
The Takeaway
This mechanistic blind spot explains why a model that understands a logical rule can still fail when you swap out the variable names. During the learning process, the weights in the final layer of the network lose the ability to distinguish between symbols they have not seen before. This collapse makes it impossible for the model to apply its reasoning to any new or rare data points. Most people assumed this was a failure of logic, but it is actually a failure of basic symbol recognition in the hardware layer. Fixing this requires a new way to initialize or train the final layers of the transformer. This insight provides a direct target for making AI better at math and programming.
From the abstract
We investigate the ability of decoder-only transformer models to perform abstract symbolic reasoning; specifically solving propositional logic reasoning problems given in-context. Previous work demonstrated that models fail to generalize to problems involving variable names that were not observed during training, and it was shown that one reason behind this is the difficulty of copying (or generating) unseen tokens. We show both theoretically and empirically that a particular representational co