[2602.11761] MiniCPM-SALA: Hybridizing Sparse and Linear Attention for Efficient Long-Context Modeling
About this article
Abstract page for arXiv paper 2602.11761: MiniCPM-SALA: Hybridizing Sparse and Linear Attention for Efficient Long-Context Modeling
Computer Science > Computation and Language arXiv:2602.11761 (cs) [Submitted on 12 Feb 2026 (v1), last revised 28 Feb 2026 (this version, v2)] Title:MiniCPM-SALA: Hybridizing Sparse and Linear Attention for Efficient Long-Context Modeling Authors:MiniCPM Team: Wenhao An, Yingfa Chen, Yewei Fang, Jiayi Li, Xin Li, Yaohui Li, Yishan Li, Yuxuan Li, Biyuan Lin, Chuan Liu, Hezi Liu, Siyuan Liu, Hongya Lyu, Yinxu Pan, Shixin Ren, Xingyu Shen, Zhou Su, Haojun Sun, Yangang Sun, Zhen Leng Thai, Xin Tian, Rui Wang, Xiaorong Wang, Yudong Wang, Bo Wu, Xiaoyue Xu, Dong Xu, Shuaikang Xue, Jiawei Yang, Bowen Zhang, Jinqian Zhang, Letian Zhang, Shengnan Zhang, Xinyu Zhang, Xinyuan Zhang, Zhu Zhang, Hengyu Zhao, Jiacheng Zhao, Zhi Zheng, Jie Zhou, Zihan Zhou, Shuo Wang, Chaojun Xiao, Xu Han, Zhiyuan Liu, Maosong Sun View a PDF of the paper titled MiniCPM-SALA: Hybridizing Sparse and Linear Attention for Efficient Long-Context Modeling, by MiniCPM Team: Wenhao An and 45 other authors View PDF HTML (experimental) Abstract:The evolution of large language models (LLMs) towards applications with ultra-long contexts faces challenges posed by the high computational and memory costs of the Transformer architecture. While existing sparse and linear attention mechanisms attempt to mitigate these issues, they typically involve a trade-off between memory efficiency and model performance. This paper introduces MiniCPM-SALA, a 9B-parameter hybrid architecture that integrates the high-fidelity long-conte...