AI & ML Nature Is Weird

Chain-of-Thought doesn't make LLMs smarter; it just makes them 'talk' more while they double down on their own biases.

April 14, 2026

Original Paper

Thinking Fast, Thinking Wrong: Intuitiveness Modulates LLM Counterfactual Reasoning in Policy Evaluation

Yanjie He

arXiv · 2604.10511

The Takeaway

The study shows that reasoning steps only help when the answer is intuitive. When faced with counter-intuitive logic, models use the extra compute to justify their initial wrong intuition, proving that LLM reasoning is often just a performative mimicry of thought.

From the abstract

Large language models (LLMs) are increasingly used for causal and counterfactual reasoning, yet their reliability in real-world policy evaluation remains underexplored. We construct a benchmark of 40 empirical policy evaluation cases drawn from economics and social science, each grounded in peer-reviewed evidence and classified by intuitiveness -- whether the empirical finding aligns with (obvious), is unclear relative to (ambiguous), or contradicts (counter-intuitive) common prior expectations.