[2604.06155] Toward Consistent World Models with Multi-Token

[2604.06155] Toward Consistent World Models with Multi-Token Prediction and Latent Semantic Enhancement

arXiv - Machine Learning April 08, 2026 3 min read

About this article

Abstract page for arXiv paper 2604.06155: Toward Consistent World Models with Multi-Token Prediction and Latent Semantic Enhancement

Computer Science > Machine Learning arXiv:2604.06155 (cs) [Submitted on 7 Apr 2026] Title:Toward Consistent World Models with Multi-Token Prediction and Latent Semantic Enhancement Authors:Qimin Zhong, Hao Liao, Haiming Qin, Mingyang Zhou, Rui Mao, Wei Chen, Naipeng Chao View a PDF of the paper titled Toward Consistent World Models with Multi-Token Prediction and Latent Semantic Enhancement, by Qimin Zhong and 5 other authors View PDF HTML (experimental) Abstract:Whether Large Language Models (LLMs) develop coherent internal world models remains a core debate. While conventional Next-Token Prediction (NTP) focuses on one-step-ahead supervision, Multi-Token Prediction (MTP) has shown promise in learning more structured representations. In this work, we provide a theoretical perspective analyzing the gradient inductive bias of MTP, supported by empirical evidence, showing that MTP promotes the convergence toward internal belief states by inducing representational contractivity via gradient coupling. However, we reveal that standard MTP often suffers from structural hallucinations, where discrete token supervision encourages illegal shortcuts in latent space that violate environmental constraints. To address this, we propose a novel method Latent Semantic Enhancement MTP (LSE-MTP), which anchors predictions to ground-truth hidden state trajectories. Experiments on synthetic graphs and real-world Manhattan Taxi Ride show that LSE-MTP effectively bridges the gap between discret...

Originally published on April 08, 2026. Curated by AI News.

Llms

Text-to-image is easy. Chaining LLMs to generate, critique, and iterate on images autonomously is a routing nightmare. AgentSwarms now supports Image generation playground and creative media workflows!

Hey everyone, If you’ve been building with AI agents, you know that orchestrating text is one thing, but stepping into multimodal workflo...

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

Llms

Zoom + Claude Connector

Zoom have just launched their Claude Connector bringing a whole host of data & information into your Claude workspace. As a Claude Co...

Reddit - Artificial Intelligence · 1 min · about 4 hours ago

Llms

Must your chatbot rat you out?

New court cases may take chatbot conversations another step away from privacy You may recall that court cases have recently held users’ c...

Reddit - Artificial Intelligence · 1 min · about 4 hours ago

Llms

[2512.07703] PVeRA: Probabilistic Vector-Based Random Matrix Adaptation

Abstract page for arXiv paper 2512.07703: PVeRA: Probabilistic Vector-Based Random Matrix Adaptation

arXiv - Machine Learning · 4 min · about 5 hours ago

[2604.06155] Toward Consistent World Models with Multi-Token Prediction and Latent Semantic Enhancement

About this article

Related Articles

Text-to-image is easy. Chaining LLMs to generate, critique, and iterate on images autonomously is a routing nightmare. AgentSwarms now supports Image generation playground and creative media workflows!

Zoom + Claude Connector

Must your chatbot rat you out?

[2512.07703] PVeRA: Probabilistic Vector-Based Random Matrix Adaptation

No comments

Stay updated with AI News