Deep neural networks see the world through tiny textures while humans identify objects by their overall global shape.
April 29, 2026
Original Paper
Spectral Analysis Reveals Fundamental Differences between Human and Deep Neural Network Shape Representations
SSRN · 6642465
The Takeaway
Spectral analysis shows that AI models are tuned to high-frequency local details that the human eye ignores. Humans rely on low-frequency global configurations to understand what an object actually is. This fundamental difference explains why AI can be easily tricked by a texture that looks right but is stretched over a nonsensical form. The way computers see a chair or a face is mathematically alien to the way a biological brain processes those images. Bridging this gap is the only way to create computer vision that is as reliable as human sight in the real world.
From the abstract
While the human visual system is known to be highly sensitive to global and configural shape information, deep neural networks models (DNNs) trained on ImageNet seem to favour local shape features. However, a more exact understanding of these differences has remained elusive, in part due to a lack of systematic methods for exploring the nature of high-dimensional shape representations.<br><br>Here we argue that a novel shape frequency analysis can provide important insights into these representa