Depth-Recurrent Transformers decouple computational depth from parameter count, revealing a 'computational frontier' where performance on reasoning tasks snaps from zero to perfect based on iteration steps.
March 24, 2026
Original Paper
Thinking Deeper, Not Longer: Depth-Recurrent Transformers for Compositional Generalization
arXiv · 2603.21676
The Takeaway
It proves that for compositional tasks like graph reachability or logic, models need 'thinking time' (recurrence) rather than more parameters. This provides a blueprint for models that can dynamically trade inference compute for reasoning depth.
From the abstract
Standard Transformers have a fixed computational depth, fundamentally limiting their ability to generalize to tasks requiring variable-depth reasoning, such as multi-hop graph traversal or nested logic. We propose a depth-recurrent Transformer that decouples computational depth from parameter count by iteratively applying a shared-weight Transformer block in latent space -- enabling the model to trade recurrence steps for deeper reasoning at inference time. Our architecture incorporates three me