AI & ML Open Release

SpecForge provides an open-source framework and high-quality draft models (SpecBundle) to make speculative decoding production-ready.

March 20, 2026

Original Paper

SpecForge: A Flexible and Efficient Open-Source Training Framework for Speculative Decoding

Shenggui Li, Chao Wang, Yikai Zhu, Yubo Wang, Fan Yin, Shuai Shi, Yefei Chen, Xiaomin Dong, Qiaoling Chen, Jin Pan, Ji Li, Laixin Xie, Yineng Zhang, Lei Yu, Yonggang Wen, Ivor Tsang, Tianwei Zhang

arXiv · 2603.18567

The Takeaway

It democratizes the training of state-of-the-art speculative decoding models (EAGLE-3), which previously lacked a scalable infrastructure. The release of pre-trained draft models for mainstream LLMs enables immediate 4.48x inference speedups in production environments.

From the abstract

Large language models incur high inference latency due to sequential autoregressive decoding. Speculative decoding alleviates this bottleneck by using a lightweight draft model to propose multiple tokens for batched verification. However, its adoption has been limited by the lack of high-quality draft models and scalable training infrastructure. We introduce SpecForge, an open-source, production-oriented framework for training speculative decoding models with full support for EAGLE-3. SpecForge

Read the original paper →

← Back to today's papers