[2602.06932] When RL Meets Adaptive Speculative Training: A Unified Training-Serving System
About this article
Abstract page for arXiv paper 2602.06932: When RL Meets Adaptive Speculative Training: A Unified Training-Serving System
Computer Science > Machine Learning arXiv:2602.06932 (cs) [Submitted on 6 Feb 2026 (v1), last revised 3 Apr 2026 (this version, v2)] Title:When RL Meets Adaptive Speculative Training: A Unified Training-Serving System Authors:Junxiong Wang, Fengxiang Bie, Jisen Li, Zhongzhu Zhou, Zelei Shao, Yubo Wang, Yinghui Liu, Qingyang Wu, Avner May, Sri Yanamandra, Yineng Zhang, Ce Zhang, Tri Dao, Percy Liang, Ben Athiwaratkun, Shuaiwen Leon Song, Chenfeng Xu, Xiaoxia Wu View a PDF of the paper titled When RL Meets Adaptive Speculative Training: A Unified Training-Serving System, by Junxiong Wang and 17 other authors View PDF HTML (experimental) Abstract:Speculative decoding can significantly accelerate LLM serving, yet most deployments today disentangle speculator training from serving, treating speculator training as a standalone offline modeling problem. We show that this decoupled formulation introduces substantial deployment and adaptation lag: (1) high time-to-serve, since a speculator must be trained offline for a considerable period before deployment; (2) delayed utility feedback, since the true end-to-end decoding speedup is only known after training and cannot be inferred reliably from acceptance rate alone due to model-architecture and system-level overheads; and (3) domain-drift degradation, as the target model is repurposed to new domains and the speculator becomes stale and less effective. To address these issues, we present Aurora, a unified training-serving system tha...