[2603.01068] LLaDA-o: An Effective and Length-Adaptive Omni Diffusion Model
About this article
Abstract page for arXiv paper 2603.01068: LLaDA-o: An Effective and Length-Adaptive Omni Diffusion Model
Computer Science > Computer Vision and Pattern Recognition arXiv:2603.01068 (cs) [Submitted on 1 Mar 2026] Title:LLaDA-o: An Effective and Length-Adaptive Omni Diffusion Model Authors:Zebin You, Xiaolu Zhang, Jun Zhou, Chongxuan Li, Ji-Rong Wen View a PDF of the paper titled LLaDA-o: An Effective and Length-Adaptive Omni Diffusion Model, by Zebin You and 4 other authors View PDF HTML (experimental) Abstract:We present \textbf{LLaDA-o}, an effective and length-adaptive omni diffusion model for multimodal understanding and generation. LLaDA-o is built on a Mixture of Diffusion (MoD) framework that decouples discrete masked diffusion for text understanding and continuous diffusion for visual generation, while coupling them through a shared, simple, and efficient attention backbone that reduces redundant computation for fixed conditions. Building on MoD, we further introduce a data-centric length adaptation strategy that enables flexible-length decoding in multimodal settings without architectural changes. Extensive experiments show that LLaDA-o achieves state-of-the-art performance among omni-diffusion models on multimodal understanding and generation benchmarks, and reaches 87.04 on DPG-Bench for text-to-image generation, supporting the effectiveness of unified omni diffusion modeling. Code is available at this https URL. Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG) Cite as: arXiv:2603.01068 [cs.CV] (or arXiv:2603.01068v1 [cs.CV] for ...