Introduces a vision model testbed that aligns AI visual attention (scanpaths) with human gaze without sacrificing classification accuracy.
March 31, 2026
Original Paper
EVA: Bridging Performance and Human Alignment in Hard-Attention Vision Models for Image Classification
arXiv · 2603.27340
The Takeaway
It proves that the 'alignment tax' in vision models can be mitigated using neuroscience-inspired hard attention, providing a path toward models that are both performant and inherently more interpretable to human observers.
From the abstract
Optimizing vision models purely for classification accuracy can impose an alignment tax, degrading human-like scanpaths and limiting interpretability. We introduce EVA, a neuroscience-inspired hard-attention mechanistic testbed that makes the performance-human-likeness trade-off explicit and adjustable. EVA samples a small number of sequential glimpses using a minimal fovea-periphery representation with CNN-based feature extractor and integrates variance control and adaptive gating to stabilize