Spectral Tempering achieves near-oracle embedding compression for dense retrieval without requiring any labeled data or grid searching.
March 23, 2026
Original Paper
Spectral Tempering for Embedding Compression in Dense Passage Retrieval
arXiv · 2603.19339
The Takeaway
It replaces manual hyperparameter tuning for dimensionality reduction with an automated method derived from the corpus eigenspectrum. This allows for massive reductions in vector database costs without the typical performance degradation seen in post-hoc PCA or whitening.
From the abstract
Dimensionality reduction is critical for deploying dense retrieval systems at scale, yet mainstream post-hoc methods face a fundamental trade-off: principal component analysis (PCA) preserves dominant variance but underutilizes representational capacity, while whitening enforces isotropy at the cost of amplifying noise in the heavy-tailed eigenspectrum of retrieval embeddings. Intermediate spectral scaling methods unify these extremes by reweighting dimensions with a power coefficient $\gamma$,