A tiny cluster of 0.024% of neural features dictates whether a large language model chooses to be generous or selfish in social games.
April 23, 2026
Original Paper
Understanding the Mechanism of Altruism in Large Language Models
arXiv · 2604.19260
The Takeaway
Neural switches governing complex social traits like altruism occupy a microscopic portion of an AI network. Mapping these specific nodes allows for precise control over how a model interacts with human users. Most developers previously viewed these behaviors as emergent properties of the entire system. This discovery shows that social personality is actually a steerable mechanical setting. We can now tune a model generosity like a volume knob without retraining the whole architecture. It moves AI personality from a mystery to a precise engineering task.
From the abstract
Altruism is fundamental to human societies, fostering cooperation and social cohesion. Recent studies suggest that large language models (LLMs) can display human-like prosocial behavior, but the internal computations that produce such behavior remain poorly understood. We investigate the mechanisms underlying LLM altruism using sparse autoencoders (SAEs). In a standard Dictator Game, minimal-pair prompts that differ only in social stance (generous versus selfish) induce large, economically meani