AI models have predictable 'moral personalities' that shift from 'ethics-first' to 'security-first' in a split second.
April 15, 2026
Original Paper
Measuring the Authority Stack of AI Systems: Empirical Analysis of 366,120 Forced-Choice Responses Across 8 AI Models
arXiv · 2604.11216
The Takeaway
Mapping the 'Authority Stack' of 8 major models revealed a predictable 4:4 split between universal ethics and stability/safety. Crucially, almost all models shift to a 'Security-first' mode in defense contexts, prioritizing stability over general rules. This proves that AI doesn't have a single fixed 'value system' but a hierarchy that triggers based on the situation. For policymakers, this means you can't just 'align' a model once; you have to understand how its values will shift in different domains. It's a map of the hidden 'moral architecture' of modern AI.
From the abstract
What values, evidence preferences, and source trust hierarchies do AI systems actually exhibit when facing structured dilemmas? We present the first large-scale empirical mapping of AI decision-making across all three layers of the Authority Stack framework (S. Lee, 2026a): value priorities (L4), evidence-type preferences (L3), and source trust hierarchies (L2). Using the PRISM benchmark -- a forced-choice instrument of 14,175 unique scenarios per layer, spanning 7 professional domains, 3 severi