[2602.08676] LLaDA2.1: Speeding Up Text Diffusion via Token Editing
Summary
LLaDA2.1 introduces a novel approach to text diffusion by integrating Token-to-Token editing into the Mask-to-Token scheme, enhancing both decoding speed and output quality.
Why It Matters
This advancement addresses the ongoing challenge in machine learning of balancing speed and quality in text generation. By improving the efficiency of large-scale models, LLaDA2.1 has the potential to significantly impact applications in natural language processing and AI-driven solutions.
Key Takeaways
- LLaDA2.1 combines Token-to-Token editing with Mask-to-Token decoding for enhanced performance.
- Introduces Speedy Mode for faster outputs and Quality Mode for improved accuracy.
- Achieves impressive task performance across 33 benchmarks, notably in coding tasks.
- Utilizes a large-scale Reinforcement Learning framework for better reasoning and instruction-following.
- Releases two model variants: LLaDA2.1-Mini (16B) and LLaDA2.1-Flash (100B).
Computer Science > Machine Learning arXiv:2602.08676 (cs) [Submitted on 9 Feb 2026 (v1), last revised 13 Feb 2026 (this version, v3)] Title:LLaDA2.1: Speeding Up Text Diffusion via Token Editing Authors:Tiwei Bie, Maosong Cao, Xiang Cao, Bingsen Chen, Fuyuan Chen, Kun Chen, Lun Du, Daozhuo Feng, Haibo Feng, Mingliang Gong, Zhuocheng Gong, Yanmei Gu, Jian Guan, Kaiyuan Guan, Hongliang He, Zenan Huang, Juyong Jiang, Zhonghui Jiang, Zhenzhong Lan, Chengxi Li, Jianguo Li, Zehuan Li, Huabin Liu, Lin Liu, Guoshan Lu, Yuan Lu, Yuxin Ma, Xingyu Mou, Zhenxuan Pan, Kaida Qiu, Yuji Ren, Jianfeng Tan, Yiding Tian, Zian Wang, Lanning Wei, Tao Wu, Yipeng Xing, Wentao Ye, Liangyu Zha, Tianze Zhang, Xiaolu Zhang, Junbo Zhao, Da Zheng, Hao Zhong, Wanli Zhong, Jun Zhou, Junlin Zhou, Liwang Zhu, Muzhi Zhu, Yihong Zhuang View a PDF of the paper titled LLaDA2.1: Speeding Up Text Diffusion via Token Editing, by Tiwei Bie and 49 other authors View PDF Abstract:While LLaDA2.0 showcased the scaling potential of 100B-level block-diffusion models and their inherent parallelization, the delicate equilibrium between decoding speed and generation quality has remained an elusive frontier. Today, we unveil LLaDA2.1, a paradigm shift designed to transcend this trade-off. By seamlessly weaving Token-to-Token (T2T) editing into the conventional Mask-to-Token (M2T) scheme, we introduce a joint, configurable threshold-decoding scheme. This structural innovation gives rise to two distinct personas: the Speedy ...