Uses Sparse Autoencoders (SAEs) to identify and steer cultural representations in LLMs, eliciting rare cultural concepts that prompting alone misses.
March 25, 2026
Original Paper
Steering LLMs for Culturally Localized Generation
arXiv · 2603.23301
The Takeaway
It provides a white-box method to solve cultural bias. Instead of black-box prompting, practitioners can use 'Cultural Embeddings' to steer models toward long-tail cultural knowledge without needing expensive localized fine-tuning.
From the abstract
LLMs are deployed globally, yet produce responses biased towards cultures with abundant training data. Existing cultural localization approaches such as prompting or post-training alignment are black-box, hard to control, and do not reveal whether failures reflect missing knowledge or poor elicitation. In this paper, we address these gaps using mechanistic interpretability to uncover and manipulate cultural representations in LLMs. Leveraging sparse autoencoders, we identify interpretable featur