AI can be trained to "look" at images exactly like a human does without losing any of its ability to identify what it sees.
April 24, 2026
Original Paper
Cognitive Alignment At No Cost: Inducing Human Attention Biases For Interpretable Vision Transformers
arXiv · 2604.20027
The Takeaway
Fine-tuning a Vision Transformer on eye-tracking data makes the model's attention patterns match human gaze. Most people assumed that forcing an AI to act more human would make it less accurate, but this experiment showed zero loss in performance. This creates a new kind of interpretable AI where we can trust that the model is seeing the same important features that we are. It eliminates the black box problem where an AI might get the right answer for the wrong reason. This could be critical for high-stakes fields like medical imaging where we need to know why the AI made a diagnosis. We can now align AI perception with human intuition for free.
From the abstract
For state-of-the-art image understanding, Vision Transformers (ViTs) have become the standard architecture but their processing diverges substantially from human attentional characteristics. We investigate whether this cognitive gap can be shrunk by fine-tuning the self-attention weights of Google's ViT-B/16 on human saliency fixation maps. To isolate the effects of semantically relevant signals from generic human supervision, the tuned model is compared against a shuffled control. Fine-tuning s