[2509.22134] Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative Decoding
About this article
Abstract page for arXiv paper 2509.22134: Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative Decoding
Computer Science > Computation and Language arXiv:2509.22134 (cs) [Submitted on 26 Sep 2025 (v1), last revised 28 Feb 2026 (this version, v2)] Title:Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative Decoding Authors:Shijing Hu, Jingyang Li, Zhihui Lu, Pan Zhou View a PDF of the paper titled Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative Decoding, by Shijing Hu and 2 other authors View PDF HTML (experimental) Abstract:Speculative decoding accelerates large language model (LLM) inference by letting a lightweight draft model propose multiple tokens that the target model verifies in parallel. Yet existing training objectives optimize only a single greedy draft path, while decoding follows a tree policy that re-ranks and verifies multiple branches. This draft policy misalignment limits achievable speedups. We introduce Group Tree Optimization (GTO), which aligns training with the decoding-time tree policy through two components: (i) Draft Tree Reward, a sampling-free objective equal to the expected acceptance length of the draft tree under the target model, directly measuring decoding performance; (ii) Group-based Draft Policy Training, a stable optimization scheme that contrasts trees from the current and a frozen reference draft model, forming debiased group-standardized advantages and applying a PPO-style surrogate along the longest accepted sequence for robust updates. We further prove that increasing our Draft Tree Rew...