[2602.17004] Arcee Trinity Large Technical Report

[2602.17004] Arcee Trinity Large Technical Report

arXiv - Machine Learning 4 min read Article

Summary

The Arcee Trinity Large Technical Report presents a new sparse Mixture-of-Experts model with 400 billion parameters, detailing its architecture, training methods, and performance metrics.

Why It Matters

This report is significant as it introduces advanced machine learning models that leverage innovative techniques in architecture and training, potentially influencing future developments in AI and machine learning applications. The introduction of models like Trinity Large could enhance performance in various AI tasks, making this research relevant for practitioners and researchers in the field.

Key Takeaways

  • Trinity Large features 400B parameters with advanced architecture including gated attention and a new load balancing strategy.
  • The model was trained on extensive datasets, with Trinity Large using 17 trillion tokens, ensuring robust performance.
  • Zero loss spikes were achieved during training, indicating stability and reliability in model performance.
  • The report includes additional models, Trinity Nano and Trinity Mini, with varying parameters and training scales.
  • The findings could influence future AI model designs and applications in various domains.

Computer Science > Machine Learning arXiv:2602.17004 (cs) [Submitted on 19 Feb 2026] Title:Arcee Trinity Large Technical Report Authors:Varun Singh, Lucas Krauss, Sami Jaghouar, Matej Sirovatka, Charles Goddard, Fares Obied, Jack Min Ong, Jannik Straube, Fern, Aria Harley, Conner Stewart, Colin Kealty, Maziyar Panahi, Simon Kirsten, Anushka Deshpande, Anneketh Vij, Arthur Bresnu, Pranav Veldurthi, Raghav Ravishankar, Hardik Bishnoi, DatologyAI Team, Arcee AI Team, Prime Intellect Team, Mark McQuade, Johannes Hagemann, Lucas Atkins View a PDF of the paper titled Arcee Trinity Large Technical Report, by Varun Singh and 25 other authors View PDF HTML (experimental) Abstract:We present the technical report for Arcee Trinity Large, a sparse Mixture-of-Experts model with 400B total parameters and 13B activated per token. Additionally, we report on Trinity Nano and Trinity Mini, with Trinity Nano having 6B total parameters with 1B activated per token, Trinity Mini having 26B total parameters with 3B activated per token. The models' modern architecture includes interleaved local and global attention, gated attention, depth-scaled sandwich norm, and sigmoid routing for Mixture-of-Experts. For Trinity Large, we also introduce a new MoE load balancing strategy titled Soft-clamped Momentum Expert Bias Updates (SMEBU). We train the models using the Muon optimizer. All three models completed training with zero loss spikes. Trinity Nano and Trinity Mini were pre-trained on 10 trillion to...

Related Articles

Machine Learning

ICML 2026 am I cooked? [D]

Hi, I am currently making the jump to ML from theoretical physics. I just got done with the review period, went from 4333 to 4433, but th...

Reddit - Machine Learning · 1 min ·
Machine Learning

[D] Dealing with an unprofessional reviewer using fake references and personal attacks in ICML26

We are currently facing an ICML 2026 reviewer who lowered the score to a 1 (Confidence 5) while ignoring our rebuttal and relying on fake...

Reddit - Machine Learning · 1 min ·
Open Source Ai

Hugging Face contributes Safetensors to PyTorch Foundation to secure AI model execution

submitted by /u/Fcking_Chuck [link] [comments]

Reddit - Artificial Intelligence · 1 min ·
Llms

[R] The Lyra Technique — A framework for interpreting internal cognitive states in LLMs (Zenodo, open access)

We're releasing a paper on a new framework for reading and interpreting the internal cognitive states of large language models: "The Lyra...

Reddit - Machine Learning · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime