Every audio AI model is built on a 1940s Western standard that makes it objectively worse for tonal languages and non-Western music.
April 14, 2026
Original Paper
Cross-Cultural Bias in Mel-Scale Representations: Evidence and Alternatives from Speech and Music
arXiv · 2604.10503
The Takeaway
The universal Mel-scale was designed using Western psychoacoustics, creating a systematic performance gap for the majority of the world's population. This reveals that a fundamental mathematical standard in audio AI is culturally biased and technically suboptimal for global use.
From the abstract
Modern audio systems universally employ mel-scale representations derived from 1940s Western psychoacoustic studies, potentially encoding cultural biases that create systematic performance disparities. We present a comprehensive evaluation of cross-cultural bias in audio front-ends, comparing mel-scale features with learnable alternatives (LEAF, SincNet) and psychoacoustic variants (ERB, Bark, CQT) across speech recognition (11 languages), music analysis (6 collections), and European acoustic sc