The modern Transformer architecture runs on the exact same five math operations as a spiking computer from 2007.
Silicon Valley treats the Transformer as a revolutionary invention that changed everything in 2017. This research reveals that the Sparse Distributed Memory machine from a decade earlier already used the same core primitives. Both systems rely on cosine similarity and specific functional mappings to learn sequences. This collision suggests that there is a fundamental correct way to process sequences that we are rediscovering. Understanding these old spiking architectures could lead to a new generation of AI that is much more energy-efficient than current GPUs.
Spiking Sequence Machines and Transformers
arXiv · 2605.00662
Sequence learning reduces to similarity-based retrieval over a temporally indexed representation space, a constraint on any sequence model, not a property of a specific architecture. We show that a spiking Sparse Distributed Memory sequence machine (2007) and the transformer (2017) independently instantiate the same five functional operations (encoding, context maintenance, associative retrieval, storage, and decoding), with cosine similarity as the shared retrieval primitive in both. We formali