[2601.02031] Output Embedding Centering for Stable LLM Pretraining

[2601.02031] Output Embedding Centering for Stable LLM Pretraining

arXiv - Machine Learning 3 min read

About this article

Abstract page for arXiv paper 2601.02031: Output Embedding Centering for Stable LLM Pretraining

Computer Science > Machine Learning arXiv:2601.02031 (cs) [Submitted on 5 Jan 2026 (v1), last revised 2 Apr 2026 (this version, v2)] Title:Output Embedding Centering for Stable LLM Pretraining Authors:Felix Stollenwerk, Anna Lokrantz, Niclas Hertzberg View a PDF of the paper titled Output Embedding Centering for Stable LLM Pretraining, by Felix Stollenwerk and 2 other authors View PDF HTML (experimental) Abstract:Pretraining of large language models is not only expensive but also prone to certain training instabilities. A specific instability that often occurs at the end of training is output logit divergence. The most widely used mitigation strategies, z-loss and logit soft-capping, merely address the symptoms rather than the underlying cause of the problem. In this paper, we analyze the instability from the perspective of the output embeddings' geometry and identify anisotropic embeddings as its source. Based on this, we propose output embedding centering (OEC) as a new mitigation strategy, and demonstrate that it suppresses output logit divergence. OEC can be implemented in two different ways: as a deterministic operation called $\mu$-centering, or a regularization method called $\mu$-loss. Our experiments show that both variants outperform z-loss in terms of training stability, while being on par with logit soft-capping. This holds true both in the presence and the absence of weight tying. As a secondary result, we find that $\mu$-loss is significantly less sensitive t...

Originally published on April 03, 2026. Curated by AI News.

Related Articles

Llms

Claude Mythos AI unauthorised access claim probed by Anthropic

submitted by /u/unserious-dude [link] [comments]

Reddit - Artificial Intelligence · 1 min ·
Google makes an interesting choice with its new agent building tool for enterprises | TechCrunch
Llms

Google makes an interesting choice with its new agent building tool for enterprises | TechCrunch

Gemini Enterprise Agent Platform takes an interesting approach: it is geared for IT and technical users.

TechCrunch - AI · 4 min ·
Google turns Chrome into an AI coworker for the workplace | TechCrunch
Llms

Google turns Chrome into an AI coworker for the workplace | TechCrunch

Google brings Gemini-powered 'auto browse' capabilities to Chrome for enterprise users, letting workers automate tasks like research, dat...

TechCrunch - AI · 5 min ·
Google Meet will take AI notes for in-person meetings too | The Verge
Llms

Google Meet will take AI notes for in-person meetings too | The Verge

Google’s AI notetaker that uses Gemini to generate meeting transcripts and summaries now supports in-person meetings and works on Zoom an...

The Verge - AI · 3 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime