AI & ML Nature Is Weird

Your multilingual AI is likely 'faking' scripts, making Indic languages look like Hindi despite perfect fluency.

April 14, 2026

Original Paper

Script Collapse in Multilingual ASR: Defining and Measuring Script Fidelity Rate

arXiv · 2604.08786

The Takeaway

Script collapse in models like Whisper v3 creates fluent text but in the wrong writing system. It reveals a critical failure mode where phonetic accuracy is high but cultural and orthographic fidelity is non-existent.

From the abstract

Word error rate (WER) is the dominant metric for automatic speech recognition, yet it cannot detect a systematic failure mode: models that produce fluent output in the wrong writing system. We define Script Fidelity Rate (SFR), the fraction of hypothesis characters in the target script block, computable without reference transcriptions, and report the first systematic measurement of script collapse across six languages spanning four writing systems (Pashto, Urdu, Hindi, Bengali, Malayalam, Somal