[2602.07238] Is there "Secret Sauce'' in Large Language Model

[2602.07238] Is there "Secret Sauce'' in Large Language Model Development?

arXiv - Machine Learning May 05, 2026 3 min read

About this article

Abstract page for arXiv paper 2602.07238: Is there "Secret Sauce'' in Large Language Model Development?

Computer Science > Artificial Intelligence arXiv:2602.07238 (cs) [Submitted on 6 Feb 2026 (v1), last revised 3 May 2026 (this version, v2)] Title:Is there "Secret Sauce'' in Large Language Model Development? Authors:Matthias Mertens, Natalia Fischl-Lanzoni, Neil Thompson View a PDF of the paper titled Is there "Secret Sauce'' in Large Language Model Development?, by Matthias Mertens and 2 other authors View PDF HTML (experimental) Abstract:Do leading LLM developers possess a proprietary ``secret sauce'', or is LLM performance driven by scaling up compute? Using training and benchmark data for 809 models released between 2022 and 2025, we estimate scaling-law regressions with release-date and developer fixed effects. We find clear evidence of developer-specific efficiency advantages, but their importance depends on where models lie in the performance distribution. At the frontier, 80-90% of performance differences are explained by higher training compute, implying that scale--not proprietary technology--drives frontier advances. Away from the frontier, however, proprietary techniques and shared algorithmic progress substantially reduce the compute required to reach fixed capability thresholds. Some companies can systematically produce smaller models more efficiently. Strikingly, we also find substantial variation of model efficiency within companies; a firm can train two models with more than 40x compute efficiency difference. We also discuss the implications for AI leaders...

Originally published on May 05, 2026. Curated by AI News.

Llms

[2602.01203] Attention Sink Forges Native MoE in Attention Layers: Sink-Aware Training to Address Head Collapse

Abstract page for arXiv paper 2602.01203: Attention Sink Forges Native MoE in Attention Layers: Sink-Aware Training to Address Head Collapse

arXiv - Machine Learning · 4 min · about 5 hours ago

Llms

[2601.01322] LinMU: Multimodal Understanding Made Linear

Abstract page for arXiv paper 2601.01322: LinMU: Multimodal Understanding Made Linear

arXiv - Machine Learning · 4 min · about 5 hours ago

Llms

[2512.05525] Poodle: Seamlessly Scaling Down Large Language Models with Just-in-Time Model Replacement

Abstract page for arXiv paper 2512.05525: Poodle: Seamlessly Scaling Down Large Language Models with Just-in-Time Model Replacement

arXiv - Machine Learning · 4 min · about 5 hours ago

Llms

[2511.21678] Agentic Learner with Grow-and-Refine Multimodal Semantic Memory

Abstract page for arXiv paper 2511.21678: Agentic Learner with Grow-and-Refine Multimodal Semantic Memory

arXiv - Machine Learning · 4 min · about 5 hours ago

[2602.07238] Is there "Secret Sauce'' in Large Language Model Development?

About this article

Related Articles

[2602.01203] Attention Sink Forges Native MoE in Attention Layers: Sink-Aware Training to Address Head Collapse

[2601.01322] LinMU: Multimodal Understanding Made Linear

[2512.05525] Poodle: Seamlessly Scaling Down Large Language Models with Just-in-Time Model Replacement

[2511.21678] Agentic Learner with Grow-and-Refine Multimodal Semantic Memory

No comments

Stay updated with AI News