Mechanistic analysis of 'counting circuits' in VLMs allows for lightweight interventions that improve general visual reasoning performance.
March 20, 2026
Original Paper
Counting Circuits: Mechanistic Interpretability of Visual Reasoning in Large Vision-Language Models
arXiv · 2603.18523
The Takeaway
It identifies specific neural circuits responsible for object counting and shows that fine-tuning strictly on these circuits yields +8% gains on OOD counting and +1.5% on general benchmarks. This provides a blueprint for targeted enhancement of model 'sub-skills' rather than general fine-tuning.
From the abstract
Counting serves as a simple but powerful test of a Large Vision-Language Model's (LVLM's) reasoning; it forces the model to identify each individual object and then add them all up. In this study, we investigate how LVLMs implement counting using controlled synthetic and real-world benchmarks, combined with mechanistic analyses. Our results show that LVLMs display a human-like counting behavior, with precise performance on small numerosities and noisy estimation for larger quantities. We introdu