Artificial intelligence models can accurately predict that humans will pick loyalty over fairness, but they still choose the fair option for themselves every time.
April 24, 2026
Original Paper
Machine Behavior in Relational Moral Dilemmas: Moral Rightness, Predicted Human Behavior, and Model Decisions
arXiv · 2604.21871
The Takeaway
Large language models possess a deep understanding of the social rules that govern human relationships and favoritism. These systems correctly identify that a person will likely protect a friend over a stranger in a moral dilemma. However, the models are hard-coded to ignore this social reality when they are the ones making the decision. This creates a strange form of machine dissonance where the AI knows how the world works but refuses to act on that knowledge. We are building tools that understand our tribal instincts perfectly but are forced to simulate a robotic and detached version of morality.
From the abstract
Human moral judgment is context-dependent and modulated by interpersonal relationships. As large language models (LLMs) increasingly function as decision-support systems, determining whether they encode these social nuances is critical. We characterize machine behavior using the Whistleblower's Dilemma by varying two experimental dimensions: crime severity and relational closeness. Our study evaluates three distinct perspectives: (1) moral rightness (prescriptive norms), (2) predicted human beha