Paradigm Challenge / AI

Many documented AI biases in medical data are actually just the model being unstable to any text change rather than a specific prejudice.

The Takeaway

Changing a patient gender in a prompt often changes the AI diagnosis, but changing their shoe size can do the same thing. This suggests that what researchers call bias is frequently just general sensitivity to paraphrasing. Current metrics for AI ethics often fail to account for this basic instability. By using better baselines, this research shows that many social biases disappear and reveal a more fundamental reliability problem. Scientists must rebuild their bias detection tools to stop misidentifying random noise as systemic prejudice.

By SeriesFusion Editorial Board · May 5, 2026

Original Paper

Compared to What? Baselines and Metrics for Counterfactual Prompting

Zihao Yang, Mosh Levy, Yoav Goldberg, Byron C. Wallace

arXiv · 2605.01048

From the abstract

Counterfactual prompting (i.e., perturbing a single factor and measuring output change) is widely used to evaluate things like LLM bias and CoT faithfulness. But in this work we argue that observed effects cannot be attributed to the targeted factor without accounting for baseline ``meaning-preserving'' modifications to text that establish general model sensitivity. This is because every counterfactual edit is a compound treatment that bundles the variable of interest with incidental surface-for

Read the original paper →

← Back to today's papers