AI & ML Nature Is Weird

AI models have predictable 'moral personalities' that shift from 'ethics-first' to 'security-first' in a split second.

April 15, 2026

Original Paper

Measuring the Authority Stack of AI Systems: Empirical Analysis of 366,120 Forced-Choice Responses Across 8 AI Models

arXiv · 2604.11216

The Takeaway

Mapping the 'Authority Stack' of 8 major models revealed a predictable 4:4 split between universal ethics and stability/safety. Crucially, almost all models shift to a 'Security-first' mode in defense contexts, prioritizing stability over general rules. This proves that AI doesn't have a single fixed 'value system' but a hierarchy that triggers based on the situation. For policymakers, this means you can't just 'align' a model once; you have to understand how its values will shift in different domains. It's a map of the hidden 'moral architecture' of modern AI.

From the abstract

What values, evidence preferences, and source trust hierarchies do AI systems actually exhibit when facing structured dilemmas? We present the first large-scale empirical mapping of AI decision-making across all three layers of the Authority Stack framework (S. Lee, 2026a): value priorities (L4), evidence-type preferences (L3), and source trust hierarchies (L2). Using the PRISM benchmark -- a forced-choice instrument of 14,175 unique scenarios per layer, spanning 7 professional domains, 3 severi

Read the original paper →

← Back to today's papers