AI & ML Nature Is Weird

Trying to fix AI bias with better instructions is like putting a band-aid on a broken bone—it actually makes the deep, nasty stuff even worse.

April 3, 2026

Original Paper

CogBias: Measuring and Mitigating Cognitive Bias in Large Language Models

Fan Huang, Songheng Zhang, Haewoon Kwak, Jisun An

arXiv · 2604.01366

The Takeaway

Efforts to clean up how an AI responds often backfire by intensifying its underlying biased judgments. This suggests that AI safety cannot be solved through prompting alone and requires changing how the models think internally.

From the abstract

Large Language Models (LLMs) are increasingly deployed in high-stakes decision-making contexts. While prior work has shown that LLMs exhibit cognitive biases behaviorally, whether these biases correspond to identifiable internal representations and can be mitigated through targeted intervention remains an open question. We define LLM cognitive bias as systematic, reproducible deviations from correct answers in tasks with computable ground-truth baselines, and introduce LLM CogBias, a benchmark o

Read the original paper →

← Back to today's papers