AI & ML Breaks Assumption

Sparse Autoencoder analysis reveals that weight pruning counter-intuitively preserves rare features better than frequent ones.

March 27, 2026

Original Paper

How Pruning Reshapes Features: Sparse Autoencoder Analysis of Weight-Pruned Language Models

Hector Borobia, Elies Seguí-Mas, Guillermina Tormo-Carbó

arXiv · 2603.25325

The Takeaway

This challenges the conventional wisdom that pruning primarily removes low-importance noise or 'rare' data. For practitioners, this suggests that compressed models may retain specialized knowledge while losing general-purpose fluency, fundamentally changing how we should evaluate and probe pruned language models.

From the abstract

Weight pruning is a standard technique for compressing large language models, yet its effect on learned internal representations remains poorly understood. We present the first systematic study of how unstructured pruning reshapes the feature geometry of language models, using Sparse Autoencoders (SAEs) as interpretability probes. Across three model families (Gemma 3 1B, Gemma 2 2B, Llama 3.2 1B), two pruning methods (magnitude and Wanda), and six sparsity levels (0--60%), we investigate five re

Read the original paper →

← Back to today's papers