[2512.21323] Parallel Token Prediction for Language Models
About this article
Abstract page for arXiv paper 2512.21323: Parallel Token Prediction for Language Models
Computer Science > Computation and Language arXiv:2512.21323 (cs) [Submitted on 24 Dec 2025 (v1), last revised 4 Mar 2026 (this version, v2)] Title:Parallel Token Prediction for Language Models Authors:Felix Draxler, Justus Will, Farrin Marouf Sofian, Theofanis Karaletsos, Sameer Singh, Stephan Mandt View a PDF of the paper titled Parallel Token Prediction for Language Models, by Felix Draxler and 5 other authors View PDF HTML (experimental) Abstract:Autoregressive decoding in language models is inherently slow, generating only one token per forward pass. We propose Parallel Token Prediction (PTP), a general-purpose framework for predicting multiple tokens in a single model call. PTP moves the source of randomness from post-hoc sampling to random input variables, making future tokens deterministic functions of those inputs and thus jointly predictable in a single forward pass. We prove that a single PTP call can represent arbitrary dependencies between tokens. PTP is trained by distilling an existing model or through inverse autoregressive training without a teacher. Experimentally, PTP achieves a 2.4x speedup on a diverse-task speculative decoding benchmark. We provide code and checkpoints at this https URL. Comments: Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG) Cite as: arXiv:2512.21323 [cs.CL] (or arXiv:2512.21323v2 [cs.CL] for this version) https://doi.org/10.48550/arXiv.2512.21323 Focus to learn more arXiv-issued DOI via DataCite Submission...