Sparse Autoencoders (SAEs) can be used to build retrieval models that outperform traditional vocabulary-based sparse retrieval in multilingual settings.
March 17, 2026
Original Paper
Learning Retrieval Models with Sparse Autoencoders
arXiv · 2603.13277
The Takeaway
By moving sparse retrieval from the vocabulary space to the latent feature space, this method (SPLARE) creates more expressive and language-agnostic embeddings. This is a significant shift for RAG systems that need to scale across different languages and domains without specialized vocabularies.
From the abstract
Sparse autoencoders (SAEs) provide a powerful mechanism for decomposing the dense representations produced by Large Language Models (LLMs) into interpretable latent features. We posit that SAEs constitute a natural foundation for Learned Sparse Retrieval (LSR), whose objective is to encode queries and documents into high-dimensional sparse representations optimized for efficient retrieval. In contrast to existing LSR approaches that project input sequences into the vocabulary space, SAE-based re