Combines the YOCO architecture with recursive computation to scale representational depth without inflating the KV cache.
April 2, 2026
Original Paper
Universal YOCO for Efficient Depth Scaling
arXiv · 2604.01220
The Takeaway
Enables much deeper model reasoning (test-time scaling) with constant global memory overhead, addressing one of the primary hardware bottlenecks of long-context, compute-heavy inference.
From the abstract
The rise of test-time scaling has remarkably boosted the reasoning and agentic proficiency of Large Language Models (LLMs). Yet, standard Transformers struggle to scale inference-time compute efficiently, as conventional looping strategies suffer from high computational overhead and a KV cache that inflates alongside model depth. We present Universal YOCO (YOCO-U), which combines the YOCO decoder-decoder architecture with recursive computation to achieve a synergistic effect greater than either