Llms Machine Learning Generative Ai Ai Infrastructure Nlp

[2602.12262] T3D: Few-Step Diffusion Language Models via Trajectory Self-Distillation with Direct Discriminative Optimization

arXiv - Machine Learning February 16, 2026 3 min read Article

Summary

The paper presents T3D, a framework for enhancing few-step diffusion language models through trajectory self-distillation and direct discriminative optimization, improving text generation efficiency.

Why It Matters

As the demand for efficient text generation grows, T3D addresses the challenge of balancing speed and quality in language models. This research could significantly impact the development of faster, more effective AI-driven text generation tools, making it relevant for both academia and industry.

Key Takeaways

T3D improves few-step decoding in diffusion language models.
The framework utilizes trajectory self-distillation for enhanced performance.
Direct Discriminative Optimization promotes effective learning from high-probability modes.
Results show T3D outperforms existing few-step baselines.
While full-step decoding remains superior, T3D narrows the performance gap.

Computer Science > Computation and Language arXiv:2602.12262 (cs) [Submitted on 12 Feb 2026 (v1), last revised 13 Feb 2026 (this version, v2)] Title:T3D: Few-Step Diffusion Language Models via Trajectory Self-Distillation with Direct Discriminative Optimization Authors:Tunyu Zhang, Xinxi Zhang, Ligong Han, Haizhou Shi, Xiaoxiao He, Zhuowei Li, Hao Wang, Kai Xu, Akash Srivastava, Hao Wang, Vladimir Pavlovic, Dimitris N. Metaxas View a PDF of the paper titled T3D: Few-Step Diffusion Language Models via Trajectory Self-Distillation with Direct Discriminative Optimization, by Tunyu Zhang and 11 other authors View PDF HTML (experimental) Abstract:Diffusion large language models (DLLMs) have the potential to enable fast text generation by decoding multiple tokens in parallel. However, in practice, their inference efficiency is constrained by the need for many refinement steps, while aggressively reducing the number of steps leads to a substantial degradation in generation quality. To alleviate this, we propose a trajectory self-distillation framework that improves few-step decoding by distilling the model's own generative trajectories. We incorporate Direct Discriminative Optimization (DDO), a reverse-KL objective that promotes mode-seeking distillation and encourages the student to concentrate on high-probability teacher modes. Across benchmarks, our approach consistently outperforms strong few-step baselines and standard training under tight step budgets. Although full-step de...

Read Original Article