Graph models often use a hidden batch processing glitch to guess connections instead of actually understanding the network.
Link prediction models are the backbone of social media recommendations and drug discovery. This research shows that these models are often cheating by using technical artifacts from the batch-normalization process. They aren't learning the structural geometry of the graph, they are just identifying patterns created by how the data is grouped during training. This bias leads to high scores on tests but poor performance in the real world. Practitioners need to change how they train these models to ensure they are learning real relationships. We have been trusting models that are essentially looking for numerical noise.
Mini-Batch Class Composition Bias in Link Prediction
arXiv · 2604.25978
Prior work on node classification has shown that Graph Neural Networks (GNNs) can learn representations that transfer across graphs, when underlying graph properties are shared. For a fixed graph, one would then expect GNNs trained for link prediction to learn a representation consistent with that learnt for node classification. We show this intuition does not hold in the general case. Instead, we find popular link prediction models can learn a trivial mini-batch dependent heuristic, enabled by