Your multilingual AI is likely 'faking' scripts, making Indic languages look like Hindi despite perfect fluency.
April 14, 2026
Original Paper
Script Collapse in Multilingual ASR: Defining and Measuring Script Fidelity Rate
arXiv · 2604.08786
The Takeaway
Script collapse in models like Whisper v3 creates fluent text but in the wrong writing system. It reveals a critical failure mode where phonetic accuracy is high but cultural and orthographic fidelity is non-existent.
From the abstract
Word error rate (WER) is the dominant metric for automatic speech recognition, yet it cannot detect a systematic failure mode: models that produce fluent output in the wrong writing system. We define Script Fidelity Rate (SFR), the fraction of hypothesis characters in the target script block, computable without reference transcriptions, and report the first systematic measurement of script collapse across six languages spanning four writing systems (Pashto, Urdu, Hindi, Bengali, Malayalam, Somal