AI & ML Efficiency Breakthrough

MineDraft achieves a 75% throughput increase in speculative decoding by overlapping the drafting and verification stages.

March 20, 2026

Original Paper

MineDraft: A Framework for Batch Parallel Speculative Decoding

Zhenwei Tang, Arun Verma, Zijian Zhou, Zhaoxuan Wu, Alok Prakash, Daniela Rus, Bryan Kian Hsiang Low

arXiv · 2603.18016

The Takeaway

Standard speculative decoding is bottlenecked by the sequential nature of drafting and verification; this framework parallelizes them across batches to hide drafting latency. It is implemented as a vLLM plugin, making it immediately practical for high-throughput production inference systems.

From the abstract

Speculative decoding (SD) accelerates large language model inference by using a smaller draft model to propose draft tokens that are subsequently verified by a larger target model. However, the performance of standard SD is often limited by the strictly sequential execution of these drafting and verification stages. To address this, this paper proposes MineDraft, a batch parallel speculative decoding (PSD) framework designed to effectively hide drafting latency by overlapping it with verificatio

Read the original paper →

← Back to today's papers