Reveals a massive 'reasoning gap' in multilingual VLMs, where accuracy drops up to 25% when switching from English to Indian languages.
March 31, 2026
Original Paper
Do Multilingual VLMs Reason Equally? A Cross-Lingual Visual Reasoning Audit for Indian Languages
arXiv · 2603.26742
The Takeaway
It breaks the assumption that multilingual pretraining automatically transfers reasoning capabilities across scripts. The finding that chain-of-thought actually degrades performance in specific languages like Bengali and Kannada suggests that current 'global' models have deeply English-centric internal reasoning chains.
From the abstract
Vision-language models score well on mathematical, scientific, and spatial reasoning benchmarks, yet these evaluations are overwhelmingly English. I present the first cross-lingual visual reasoning audit for Indian languages. 980 questions from MathVista, ScienceQA, and MMMU are translated into Hindi, Tamil, Telugu, Bengali, Kannada, and Marathi using IndicTrans2, with Gemini 2.0 Flash cross-verification on 50 samples per language (inter-translator agreement 0.79-0.84). Eight VLMs, from 7B open-