DeepSeek has shipped DSpark, a speculative decoding framework that speeds up inference on its V4 models by up to 80 percent, and open-sourced the full training and evaluation stack as DeepSpec. The update, reported June 27, attaches a decoding module to the existing DeepSeek-V4-Pro checkpoint rather than retraining the model, so the speed gain comes free of any quality trade-off.

What Happened

DSpark is now active in both DeepSeek-V4 Flash and Pro. It uses a semi-autoregressive draft model and confidence-scheduled verification to cut the GPU stalls that normally bottleneck token generation. In DeepSeek's tests, described in the accompanying technical report, DSpark raised acceptance lengths by 16.3 to 30.9 percent over rival speculative decoding methods like Eagle3 and DFlash. The whole codebase, including data prep, draft model training, and evaluation scripts, is released under an MIT license.

Why It Matters

Speculative decoding lets a model draft several tokens at once and verify them in a single pass, so faster inference does not mean a cheaper, dumber model. For anyone building on DeepSeek, an 80 percent throughput gain translates directly into lower latency for chat, coding agents, and batch generation. Faster local generation also lowers the GPU memory pressure of long contexts, which matters when DeepSeek-V4 already carries a 1M-token window. The release sits on top of DeepSeek's aggressive pricing, covered in our look at the permanent V4-Pro discount, and joins a growing shelf of open DeepSeek releases that keep undercutting closed labs on cost.

Key Details

Because DSpark is a module bolted onto an unchanged checkpoint, the outputs are identical to standard DeepSeek-V4-Pro, just generated faster. The open-source DeepSpec repo is the more interesting release for builders: it is a full-stack reference for training your own speculative decoding draft models, not just a way to run DeepSeek's. Early coverage from industry trackers frames it as a direct challenge to closed inference stacks, since the acceptance-rate gains are reproducible from the published code.

What to Do Next

If you self-host DeepSeek, pull the DSpark checkpoint and benchmark generation speed against your current setup before committing it to production. Teams running DeepSeek through a coding agent, like the workflow in our Reasonix breakdown, should see the biggest wins on long, multi-step tasks. Builders who train their own models can study DeepSpec to add speculative decoding to a different base model entirely, using the published data-prep and evaluation scripts as a template rather than building the draft-model pipeline from scratch.