[2602.11761] MiniCPM-SALA: Hybridizing Sparse and Linear Attention for Efficient Long-Context Modeling

[2602.11761] MiniCPM-SALA: Hybridizing Sparse and Linear Attention for Efficient Long-Context Modeling

arXiv - Machine Learning 4 min read

About this article

Abstract page for arXiv paper 2602.11761: MiniCPM-SALA: Hybridizing Sparse and Linear Attention for Efficient Long-Context Modeling

Computer Science > Computation and Language arXiv:2602.11761 (cs) [Submitted on 12 Feb 2026 (v1), last revised 28 Feb 2026 (this version, v2)] Title:MiniCPM-SALA: Hybridizing Sparse and Linear Attention for Efficient Long-Context Modeling Authors:MiniCPM Team: Wenhao An, Yingfa Chen, Yewei Fang, Jiayi Li, Xin Li, Yaohui Li, Yishan Li, Yuxuan Li, Biyuan Lin, Chuan Liu, Hezi Liu, Siyuan Liu, Hongya Lyu, Yinxu Pan, Shixin Ren, Xingyu Shen, Zhou Su, Haojun Sun, Yangang Sun, Zhen Leng Thai, Xin Tian, Rui Wang, Xiaorong Wang, Yudong Wang, Bo Wu, Xiaoyue Xu, Dong Xu, Shuaikang Xue, Jiawei Yang, Bowen Zhang, Jinqian Zhang, Letian Zhang, Shengnan Zhang, Xinyu Zhang, Xinyuan Zhang, Zhu Zhang, Hengyu Zhao, Jiacheng Zhao, Zhi Zheng, Jie Zhou, Zihan Zhou, Shuo Wang, Chaojun Xiao, Xu Han, Zhiyuan Liu, Maosong Sun View a PDF of the paper titled MiniCPM-SALA: Hybridizing Sparse and Linear Attention for Efficient Long-Context Modeling, by MiniCPM Team: Wenhao An and 45 other authors View PDF HTML (experimental) Abstract:The evolution of large language models (LLMs) towards applications with ultra-long contexts faces challenges posed by the high computational and memory costs of the Transformer architecture. While existing sparse and linear attention mechanisms attempt to mitigate these issues, they typically involve a trade-off between memory efficiency and model performance. This paper introduces MiniCPM-SALA, a 9B-parameter hybrid architecture that integrates the high-fidelity long-conte...

Originally published on March 03, 2026. Curated by AI News.

Related Articles

Claude Mythos and misguided open-weight fearmongering
Llms

Claude Mythos and misguided open-weight fearmongering

AI Tools & Products · 9 min ·
Llms

Anthropic Agrees to Rent CoreWeave AI Capacity to Power Claude

AI Tools & Products · 1 min ·
CoreWeave strikes a deal to power Anthropic's Claude AI models — and the stock surges 12%
Llms

CoreWeave strikes a deal to power Anthropic's Claude AI models — and the stock surges 12%

AI Tools & Products · 3 min ·
Walmart’s AI Push Links Gemini App Experience With U.S. Manufacturing Shift
Llms

Walmart’s AI Push Links Gemini App Experience With U.S. Manufacturing Shift

AI Tools & Products · 6 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime