AI & ML Efficiency Breakthrough

Empirically proves that most Transformer layers are redundant, enabling a 54% training cost reduction through non-uniform budget allocation.

March 23, 2026

Original Paper

Anatomical Heterogeneity in Transformer Language Models

Tomasz Wietrzykowski

arXiv · 2603.19348

The Takeaway

The study identifies 'anti-layers' whose removal actually improves performance and a 'critical core' of layers that are 10^7 times more important than others. Allocating compute according to this anatomical heterogeneity yields 4.7x lower loss at identical parameter counts.

From the abstract

Current transformer language models are trained with uniform computational budgets across all layers, implicitly assuming layer homogeneity. We challenge this assumption through empirical analysis of SmolLM2-135M, a 30-layer, 135M-parameter causal language model, using five diagnostic metrics: weight predictability (R2), ablation degradation, recovery speed, weight manipulation robustness, and structural analysis. We find profound anatomical heterogeneity: (1) Layer weights follow strong mathema

Read the original paper →

← Back to today's papers