[2602.13367] Nanbeige4.1-3B: A Small General Model that Reasons, Aligns, and Acts
Summary
Nanbeige4.1-3B is a novel small generalist language model that excels in reasoning, alignment, and code generation, demonstrating significant advancements in AI capabilities with only 3 billion parameters.
Why It Matters
This research highlights the potential of smaller language models to achieve high performance in complex tasks, challenging the notion that larger models are always superior. It opens avenues for more efficient AI applications and democratizes access to advanced AI capabilities.
Key Takeaways
- Nanbeige4.1-3B achieves strong performance in reasoning and code generation with only 3 billion parameters.
- The model utilizes innovative reward modeling techniques for improved human alignment and response quality.
- It demonstrates the ability to perform complex problem-solving with stable tool interactions over extended sequences.
Computer Science > Artificial Intelligence arXiv:2602.13367 (cs) [Submitted on 13 Feb 2026] Title:Nanbeige4.1-3B: A Small General Model that Reasons, Aligns, and Acts Authors:Chen Yang, Guangyue Peng, Jiaying Zhu, Ran Le, Ruixiang Feng, Tao Zhang, Xiyun Xu, Yang Song, Yiming Jia, Yuntao Wen, Yunzhi Xu, Zekai Wang, Zhenwei An, Zhicong Sun, Zongchao Chen View a PDF of the paper titled Nanbeige4.1-3B: A Small General Model that Reasons, Aligns, and Acts, by Chen Yang and 14 other authors View PDF HTML (experimental) Abstract:We present Nanbeige4.1-3B, a unified generalist language model that simultaneously achieves strong agentic behavior, code generation, and general reasoning with only 3B parameters. To the best of our knowledge, it is the first open-source small language model (SLM) to achieve such versatility in a single model. To improve reasoning and preference alignment, we combine point-wise and pair-wise reward modeling, ensuring high-quality, human-aligned responses. For code generation, we design complexity-aware rewards in Reinforcement Learning, optimizing both correctness and efficiency. In deep search, we perform complex data synthesis and incorporate turn-level supervision during training. This enables stable long-horizon tool interactions, allowing Nanbeige4.1-3B to reliably execute up to 600 tool-call turns for complex problem-solving. Extensive experimental results show that Nanbeige4.1-3B significantly outperforms prior models of similar scale, such as Nan...