[2603.20219] Thinking into the Future: Latent Lookahead Training for Transformers
About this article
Abstract page for arXiv paper 2603.20219: Thinking into the Future: Latent Lookahead Training for Transformers
Computer Science > Computation and Language arXiv:2603.20219 (cs) [Submitted on 3 Mar 2026] Title:Thinking into the Future: Latent Lookahead Training for Transformers Authors:Lorenzo Noci, Gregor Bachmann, Seyed-Mohsen Moosavi-Dezfooli, Moin Nabi View a PDF of the paper titled Thinking into the Future: Latent Lookahead Training for Transformers, by Lorenzo Noci and 3 other authors View PDF HTML (experimental) Abstract:Autoregressive language models trained with next-token prediction generate text by sampling one discrete token at a time. Although very scalable, this objective forces the model to commit at every step, preventing it from exploring or reflecting upon multiple plausible continuations. Furthermore, the compute allocation across tokens is uniform; every token is formed based on a single forward-pass, potentially limiting the model's expressiveness in cases where difficult tokens require inherently more compute. Towards addressing these limitations, we introduce latent lookahead, a training strategy that enables models to "think" before generating: at selected positions in the sequence, before committing to the next token, the model performs a multi-step lookahead in latent space. More precisely, instead of sampling future tokens, we leverage the network's latent space by recursively feeding its hidden states back into the context for $\tau$ steps, investing more compute on predicting that token. This produces $\tau$ latent predictions that are supervised against...