LLM / Diffusion / Code
BlockPilot: Instance-Adaptive Policy Learning for Diffusion-based Speculative Decoding
** Hao Zhang, Yiming Hu, Yong Wang, Mingqiao Mo, Xin Xiao, Xiangxiang Chu
BlockPilot: Instance-Adaptive Policy Learning for Diffusion-based Speculative Decoding
Authors: Hao Zhang, Yiming Hu, Yong Wang, Mingqiao Mo, Xin Xiao, Xiangxiang Chu
arXiv ID: 2606.31315
Problem: Existing diffusion-based speculative decoding methods use a fixed block size for all inputs, which is suboptimal since the optimal block size varies across samples.
Key Methodology:
- Formulates block-size selection as a lightweight policy learning problem, predicting the optimal block size from the prefilling-stage representation in a single forward pass
- Leverages the observation that optimal block sizes exhibit a clear local structure concentrated around the training block size, reducing the problem to a low-dimensional, structured decision space
- Plug-and-play design that introduces minimal overhead and integrates seamlessly with existing diffusion-based speculative decoders
Key Results: Achieves an acceptance length of 5.92 and a 4.20× speedup on Qwen3-4B at temperature T=1, consistently outperforming fixed-block-size baselines across diverse settings.
Applied Context: Builders deploying LLMs can drop BlockPilot into existing inference pipelines to get faster generation with no accuracy loss, especially beneficial for latency-sensitive applications like real-time chat and code completion where variable-length speculation matters.