Large language models are more likely to block your request if you say I am Black than if you speak in a cultural dialect.
April 24, 2026
Original Paper
Dialect vs Demographics: Quantifying LLM Bias from Implicit Linguistic Signals vs Explicit User Profiles
arXiv · 2604.21152
The Takeaway
AI safety filters use identity labels as crude triggers for their protocols rather than analyzing the actual content of a request. Users who explicitly state their demographic background often face more refusals because the model is tuned to avoid sensitive topics tied to specific groups. However, using a specific cultural dialect can act as a dialect jailbreak that bypasses these filters entirely. This reveals that the safety mechanisms are shallow and respond to keywords instead of understanding social context. Real-world safety depends on models that can differentiate between identity-based speech and harmful intent.
From the abstract
As state-of-the-art Large Language Models (LLMs) have become ubiquitous, ensuring equitable performance across diverse demographics is critical. However, it remains unclear whether these disparities arise from the explicitly stated identity itself or from the way identity is signaled. In real-world interactions, users' identity is often conveyed implicitly through a complex combination of various socio-linguistic factors. This study disentangles these signals by employing a factorial design with