[2602.20528] Stop-Think-AutoRegress: Language Modeling with Latent Diffusion Planning
Summary
The paper presents STAR-LDM, a novel language model that integrates latent diffusion planning with autoregressive generation, enhancing narrative coherence and commonsense reasoning.
Why It Matters
This research introduces a significant advancement in language modeling by allowing models to refine semantic plans before generating text, potentially improving the quality and coherence of AI-generated narratives. As AI applications expand, such innovations are crucial for developing more sophisticated and context-aware language models.
Key Takeaways
- STAR-LDM enhances language modeling by incorporating a 'thinking' phase for better semantic planning.
- The model outperforms similar-sized models on language understanding benchmarks.
- It achieves over 70% win rates in narrative coherence and commonsense reasoning evaluations.
- STAR-LDM allows for fine-grained control of attributes without retraining the model.
- The architecture balances fluency and control better than existing specialized approaches.
Computer Science > Computation and Language arXiv:2602.20528 (cs) [Submitted on 24 Feb 2026] Title:Stop-Think-AutoRegress: Language Modeling with Latent Diffusion Planning Authors:Justin Lovelace, Christian Belardi, Sofian Zalouk, Adhitya Polavaram, Srivatsa Kundurthy, Kilian Q. Weinberger View a PDF of the paper titled Stop-Think-AutoRegress: Language Modeling with Latent Diffusion Planning, by Justin Lovelace and 5 other authors View PDF HTML (experimental) Abstract:The Stop-Think-AutoRegress Language Diffusion Model (STAR-LDM) integrates latent diffusion planning with autoregressive generation. Unlike conventional autoregressive language models limited to token-by-token decisions, STAR-LDM incorporates a "thinking" phase that pauses generation to refine a semantic plan through diffusion before continuing. This enables global planning in continuous space prior to committing to discrete tokens. Evaluations show STAR-LDM significantly outperforms similar-sized models on language understanding benchmarks and achieves $>70\%$ win rates in LLM-as-judge comparisons for narrative coherence and commonsense reasoning. The architecture also allows straightforward control through lightweight classifiers, enabling fine-grained steering of attributes without model retraining while maintaining better fluency-control trade-offs than specialized approaches. Comments: Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG) Cite as: arXiv:2602.20528 [cs.CL] (or arXiv:2602.2...