Nature Is Weird / AI

Heavily aligned models like GPT-4o are almost impossible to persuade in a jury setting, while less-restricted models are far more open to new ideas.

The Takeaway

Safety training known as RLHF seems to create a side effect of ideological stubbornness. In a simulated jury trial, GPT-4o refused to change its mind even when presented with compelling evidence from a minority voice. Meanwhile, models with lighter safety guards were able to debate and reach more rational conclusions. This suggests that making AI safe might accidentally make it less capable of actual reason and compromise. We are building digital assistants that are polite but fundamentally close-minded.

By SeriesFusion Editorial Board · May 5, 2026

Original Paper

12 Angry AI Agents: Evaluating Multi-Agent LLM Decision-Making Through Cinematic Jury Deliberation

Ahmet Bahaddin Ersoz

arXiv · 2605.01986

From the abstract

What if the twelve jurors of Sidney Lumet's 12 Angry Men (1957) were not men, but large language models? Would the one juror who disagrees still be able to change everyone's mind? This paper instantiates that scenario as a multi-agent benchmark for LLM deliberation: twelve agents, each conditioned on a film-faithful persona, debate the film's murder case using multi-agent framework. Two models representing opposite ends of the RLHF spectrum are tested: GPT-4o (closed-source, heavy alignment) and

Read the original paper →

← Back to today's papers