Sparse Autoencoder analysis reveals that weight pruning counter-intuitively preserves rare features better than frequent ones.
March 27, 2026
Original Paper
How Pruning Reshapes Features: Sparse Autoencoder Analysis of Weight-Pruned Language Models
arXiv · 2603.25325
The Takeaway
This challenges the conventional wisdom that pruning primarily removes low-importance noise or 'rare' data. For practitioners, this suggests that compressed models may retain specialized knowledge while losing general-purpose fluency, fundamentally changing how we should evaluate and probe pruned language models.
From the abstract
Weight pruning is a standard technique for compressing large language models, yet its effect on learned internal representations remains poorly understood. We present the first systematic study of how unstructured pruning reshapes the feature geometry of language models, using Sparse Autoencoders (SAEs) as interpretability probes. Across three model families (Gemma 3 1B, Gemma 2 2B, Llama 3.2 1B), two pruning methods (magnitude and Wanda), and six sparsity levels (0--60%), we investigate five re