Provides the first theoretical proof that Graph Transformers structurally prevent the 'oversmoothing' failure mode inherent to deep GCNs.
March 19, 2026
Original Paper
Gaussian Process Limit Reveals Structural Benefits of Graph Transformers
arXiv · 2603.17569
The Takeaway
Using Gaussian Process limits, the authors demonstrate why attention-based graph models preserve community information and node distinctness at depth. This provides a rigorous justification for building deeper graph architectures and explains the empirical success of transformers over message-passing networks.
From the abstract
Graph transformers are the state-of-the-art for learning from graph-structured data and are empirically known to avoid several pitfalls of message-passing architectures. However, there is limited theoretical analysis on why these models perform well in practice. In this work, we prove that attention-based architectures have structural benefits over graph convolutional networks in the context of node-level prediction tasks. Specifically, we study the neural network gaussian process limits of grap