MineDraft achieves a 75% throughput increase in speculative decoding by overlapping the drafting and verification stages.
March 20, 2026
Original Paper
MineDraft: A Framework for Batch Parallel Speculative Decoding
arXiv · 2603.18016
The Takeaway
Standard speculative decoding is bottlenecked by the sequential nature of drafting and verification; this framework parallelizes them across batches to hide drafting latency. It is implemented as a vLLM plugin, making it immediately practical for high-throughput production inference systems.
From the abstract
Speculative decoding (SD) accelerates large language model inference by using a smaller draft model to propose draft tokens that are subsequently verified by a larger target model. However, the performance of standard SD is often limited by the strictly sequential execution of these drafting and verification stages. To address this, this paper proposes MineDraft, a batch parallel speculative decoding (PSD) framework designed to effectively hide drafting latency by overlapping it with verificatio