[2510.24821] Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation
About this article
Abstract page for arXiv paper 2510.24821: Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation
Computer Science > Computer Vision and Pattern Recognition arXiv:2510.24821 (cs) [Submitted on 28 Oct 2025 (v1), last revised 26 Mar 2026 (this version, v3)] Title:Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation Authors:Inclusion AI: Bowen Ma, Cheng Zou, ChengKun Du, Canxiang Yan, Chunxiang Jin, Chunjie Shen, Chenyu Lian, Chengxiang Fan, Dandan Zheng, Fudong Wang, Furong Xu, Guangming Yao, Haohao Liu, Han Peng, Jun Zhou, Junluan Xia, Jingdong Chen, Jianing Li, Jianxin Sun, Jianjiang Zhu, Jianping Jiang, Jinpeng Ou, Jun Peng, Jin Peng, Kaixiang Ji, Li Tang, Libin Wang, Lixiang Ru, Longhua Tan, Lu Ma, Lan Wang, Mochen Bai, Minghong Cai, Mingxue Yang, Ning Gao, Qingpei Guo, Qinglong Zhang, Qiang Xu, Qin Zhao, Rui Liu, Ruijie Xiong, Ruobing Zheng, Sirui Gao, Shaoxiong Lin, Tao Zhang, Tianqi Li, Tinghao Liu, Tongli Wang, Taoye Huang, Weilong Chai, Xiaomei Wang, Xiaolong Wang, Xiaojian Liu, Xiao Lu, Xiaoyu Li, Xingning Dong, Xuzheng Yu, Xuezhi Wang, Yi Yuan, Yuting Gao, Yuting Xiao, Yunxiao Sun, Yipeng Chen, Yifan Mao, Yifei Wu, Yongjie Lyu, Yingying Zhang, YuQian Li, Ziping Ma, Zhiqiang Fang, Zhihao Qiu, Ziyuan Huang, Zizheng Yang, Zhengyu He View a PDF of the paper titled Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation, by Inclusion AI: Bowen Ma and 73 other authors View PDF HTML (experimental) Abstract:We propose Ming-Flash-Omni, an upgraded version of Ming-Omni, built upon a sparser Mixture-of-...