[2603.01331] MetaState: Persistent Working Memory for Discrete Diffusion Language Models
About this article
Abstract page for arXiv paper 2603.01331: MetaState: Persistent Working Memory for Discrete Diffusion Language Models
Computer Science > Computation and Language arXiv:2603.01331 (cs) [Submitted on 2 Mar 2026] Title:MetaState: Persistent Working Memory for Discrete Diffusion Language Models Authors:Kejing Xia, Mingzhe Li, Lixuan Wei, Zhenbang Du, Xiangchi Yuan, Qirui Jin, Wenke Lee View a PDF of the paper titled MetaState: Persistent Working Memory for Discrete Diffusion Language Models, by Kejing Xia and 6 other authors View PDF HTML (experimental) Abstract:Discrete diffusion language models (dLLMs) generate text by iteratively denoising a masked sequence. Compared with autoregressive models, this paradigm naturally supports parallel decoding, bidirectional context, and flexible generation patterns. However, standard dLLMs condition each denoising step only on the current hard-masked sequence, while intermediate continuous representations are discarded after sampling and remasking. We refer to this bottleneck as the \textbf{Information Island} problem. It leads to redundant recomputation across steps and can degrade cross-step consistency. We address this limitation with \textbf{MetaState}, a lightweight recurrent augmentation that equips a frozen dLLM backbone with a persistent, fixed-size working memory that remains independent of sequence length. \textbf{MetaState} consists of three trainable modules: a cross-attention Mixer that reads backbone activations into memory slots, a GRU-style Updater that integrates information across denoising steps, and a cross-attention Injector that fee...