SpecForge provides an open-source framework and high-quality draft models (SpecBundle) to make speculative decoding production-ready.
March 20, 2026
Original Paper
SpecForge: A Flexible and Efficient Open-Source Training Framework for Speculative Decoding
arXiv · 2603.18567
The Takeaway
It democratizes the training of state-of-the-art speculative decoding models (EAGLE-3), which previously lacked a scalable infrastructure. The release of pre-trained draft models for mainstream LLMs enables immediate 4.48x inference speedups in production environments.
From the abstract
Large language models incur high inference latency due to sequential autoregressive decoding. Speculative decoding alleviates this bottleneck by using a lightweight draft model to propose multiple tokens for batched verification. However, its adoption has been limited by the lack of high-quality draft models and scalable training infrastructure. We introduce SpecForge, an open-source, production-oriented framework for training speculative decoding models with full support for EAGLE-3. SpecForge