AI & ML Breaks Assumption

Reveals that many 'polysemantic' neurons in LLMs are actually firing for shared word forms (lexical) rather than compressed semantic concepts.

April 2, 2026

Original Paper

Polysemanticity or Polysemy? Lexical Identity Confounds Superposition Metrics

Iyad Ait Hou, Rebecca Hwa

arXiv · 2604.00443

The Takeaway

A critical finding for mechanistic interpretability; it shows that 18-36% of Sparse Autoencoder features blend senses due to lexical confounds. Filtering these improves knowledge editing and word sense disambiguation performance.

From the abstract

If the same neuron activates for both "lender" and "riverside," standard metrics attribute the overlap to superposition--the neuron must be compressing two unrelated concepts. This work explores how much of the overlap is due a lexical confound: neurons fire for a shared word form (such as "bank") rather than for two compressed concepts. A 2x2 factorial decomposition reveals that the lexical-only condition (same word, different meaning) consistently exceeds the semantic-only condition (different

Read the original paper →

← Back to today's papers