[2602.06932] When RL Meets Adaptive Speculative Training: A Unified

[2602.06932] When RL Meets Adaptive Speculative Training: A Unified Training-Serving System

arXiv - Machine Learning April 06, 2026 4 min read

About this article

Abstract page for arXiv paper 2602.06932: When RL Meets Adaptive Speculative Training: A Unified Training-Serving System

Computer Science > Machine Learning arXiv:2602.06932 (cs) [Submitted on 6 Feb 2026 (v1), last revised 3 Apr 2026 (this version, v2)] Title:When RL Meets Adaptive Speculative Training: A Unified Training-Serving System Authors:Junxiong Wang, Fengxiang Bie, Jisen Li, Zhongzhu Zhou, Zelei Shao, Yubo Wang, Yinghui Liu, Qingyang Wu, Avner May, Sri Yanamandra, Yineng Zhang, Ce Zhang, Tri Dao, Percy Liang, Ben Athiwaratkun, Shuaiwen Leon Song, Chenfeng Xu, Xiaoxia Wu View a PDF of the paper titled When RL Meets Adaptive Speculative Training: A Unified Training-Serving System, by Junxiong Wang and 17 other authors View PDF HTML (experimental) Abstract:Speculative decoding can significantly accelerate LLM serving, yet most deployments today disentangle speculator training from serving, treating speculator training as a standalone offline modeling problem. We show that this decoupled formulation introduces substantial deployment and adaptation lag: (1) high time-to-serve, since a speculator must be trained offline for a considerable period before deployment; (2) delayed utility feedback, since the true end-to-end decoding speedup is only known after training and cannot be inferred reliably from acceptance rate alone due to model-architecture and system-level overheads; and (3) domain-drift degradation, as the target model is repurposed to new domains and the speculator becomes stale and less effective. To address these issues, we present Aurora, a unified training-serving system tha...

Originally published on April 06, 2026. Curated by AI News.

Llms

[2604.01989] Attention at Rest Stays at Rest: Breaking Visual Inertia for Cognitive Hallucination Mitigation

Abstract page for arXiv paper 2604.01989: Attention at Rest Stays at Rest: Breaking Visual Inertia for Cognitive Hallucination Mitigation

arXiv - AI · 4 min · about 1 hour ago

Llms

[2603.24326] Boosting Document Parsing Efficiency and Performance with Coarse-to-Fine Visual Processing

Abstract page for arXiv paper 2603.24326: Boosting Document Parsing Efficiency and Performance with Coarse-to-Fine Visual Processing

arXiv - AI · 4 min · about 1 hour ago

Llms

[2603.18545] CoDA: Exploring Chain-of-Distribution Attacks and Post-Hoc Token-Space Repair for Medical Vision-Language Models

Abstract page for arXiv paper 2603.18545: CoDA: Exploring Chain-of-Distribution Attacks and Post-Hoc Token-Space Repair for Medical Visio...

arXiv - AI · 4 min · about 1 hour ago

Llms

[2509.22367] What Is The Political Content in LLMs' Pre- and Post-Training Data?

Abstract page for arXiv paper 2509.22367: What Is The Political Content in LLMs' Pre- and Post-Training Data?

arXiv - AI · 4 min · about 1 hour ago

[2602.06932] When RL Meets Adaptive Speculative Training: A Unified Training-Serving System

About this article

Related Articles

[2604.01989] Attention at Rest Stays at Rest: Breaking Visual Inertia for Cognitive Hallucination Mitigation

[2603.24326] Boosting Document Parsing Efficiency and Performance with Coarse-to-Fine Visual Processing

[2603.18545] CoDA: Exploring Chain-of-Distribution Attacks and Post-Hoc Token-Space Repair for Medical Vision-Language Models

[2509.22367] What Is The Political Content in LLMs' Pre- and Post-Training Data?

No comments

Stay updated with AI News