[2512.16762] NRGPT: An Energy-based Alternative for GPT

[2512.16762] NRGPT: An Energy-based Alternative for GPT

arXiv - Machine Learning 3 min read Article

Summary

The paper presents NRGPT, an energy-based alternative to GPT, proposing a novel approach that integrates energy-based modeling with language modeling, demonstrating its effectiveness on various tasks.

Why It Matters

As generative models like GPT dominate natural language processing, exploring alternative frameworks like energy-based modeling can enhance understanding and performance. NRGPT offers insights into resilience against overfitting and new methods for inference, potentially influencing future AI developments.

Key Takeaways

  • NRGPT integrates energy-based modeling with traditional GPT architectures.
  • The model shows promise in resisting overfitting during extensive training.
  • Empirical results indicate effective performance on diverse language tasks.

Computer Science > Machine Learning arXiv:2512.16762 (cs) [Submitted on 18 Dec 2025 (v1), last revised 25 Feb 2026 (this version, v2)] Title:NRGPT: An Energy-based Alternative for GPT Authors:Nima Dehmamy, Benjamin Hoover, Bishwajit Saha, Leo Kozachkov, Jean-Jacques Slotine, Dmitry Krotov View a PDF of the paper titled NRGPT: An Energy-based Alternative for GPT, by Nima Dehmamy and 5 other authors View PDF HTML (experimental) Abstract:Generative Pre-trained Transformer (GPT) architectures are the most popular design for language modeling. Energy-based modeling is a different paradigm that views inference as a dynamical process operating on an energy landscape. We propose a minimal modification of the GPT setting to unify it with the EBM framework. The inference step of our model, which we call eNeRgy-GPT (NRGPT), is conceptualized as an exploration of the tokens on the energy landscape. We prove, and verify empirically, that under certain circumstances this exploration becomes gradient descent, although they don't necessarily lead to the best performing models. We demonstrate that our model performs well for simple language (Shakespeare dataset), algebraic ListOPS tasks, and richer settings such as OpenWebText language modeling. We also observe that our models may be more resistant to overfitting, doing so only during very long training. Comments: Subjects: Machine Learning (cs.LG) Cite as: arXiv:2512.16762 [cs.LG]   (or arXiv:2512.16762v2 [cs.LG] for this version)   https...

Related Articles

Anthropic’s Unreleased Claude Mythos Might Be The Most Advanced AI Model Yet
Llms

Anthropic’s Unreleased Claude Mythos Might Be The Most Advanced AI Model Yet

Anthropic is testing an unreleased artificial intelligence (AI) model with capabilities that exceed any system it has previously released...

AI Tools & Products · 5 min ·
Anthropic leaks part of Claude Code's internal source code
Llms

Anthropic leaks part of Claude Code's internal source code

Claude Code has seen massive adoption over the last year, and its run-rate revenue had swelled to more than $2.5 billion as of February.

AI Tools & Products · 3 min ·
Australian government and Anthropic sign MOU for AI safety and research
Llms

Australian government and Anthropic sign MOU for AI safety and research

Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.

AI Tools & Products · 5 min ·
Penguin to sue OpenAI over ChatGPT version of German children’s book
Llms

Penguin to sue OpenAI over ChatGPT version of German children’s book

Publisher alleges AI research company’s chatbot violated its copyright over Coconut the Little Dragon series

AI Tools & Products · 3 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime