[2512.16762] NRGPT: An Energy-based Alternative for GPT
Summary
The paper presents NRGPT, an energy-based alternative to GPT, proposing a novel approach that integrates energy-based modeling with language modeling, demonstrating its effectiveness on various tasks.
Why It Matters
As generative models like GPT dominate natural language processing, exploring alternative frameworks like energy-based modeling can enhance understanding and performance. NRGPT offers insights into resilience against overfitting and new methods for inference, potentially influencing future AI developments.
Key Takeaways
- NRGPT integrates energy-based modeling with traditional GPT architectures.
- The model shows promise in resisting overfitting during extensive training.
- Empirical results indicate effective performance on diverse language tasks.
Computer Science > Machine Learning arXiv:2512.16762 (cs) [Submitted on 18 Dec 2025 (v1), last revised 25 Feb 2026 (this version, v2)] Title:NRGPT: An Energy-based Alternative for GPT Authors:Nima Dehmamy, Benjamin Hoover, Bishwajit Saha, Leo Kozachkov, Jean-Jacques Slotine, Dmitry Krotov View a PDF of the paper titled NRGPT: An Energy-based Alternative for GPT, by Nima Dehmamy and 5 other authors View PDF HTML (experimental) Abstract:Generative Pre-trained Transformer (GPT) architectures are the most popular design for language modeling. Energy-based modeling is a different paradigm that views inference as a dynamical process operating on an energy landscape. We propose a minimal modification of the GPT setting to unify it with the EBM framework. The inference step of our model, which we call eNeRgy-GPT (NRGPT), is conceptualized as an exploration of the tokens on the energy landscape. We prove, and verify empirically, that under certain circumstances this exploration becomes gradient descent, although they don't necessarily lead to the best performing models. We demonstrate that our model performs well for simple language (Shakespeare dataset), algebraic ListOPS tasks, and richer settings such as OpenWebText language modeling. We also observe that our models may be more resistant to overfitting, doing so only during very long training. Comments: Subjects: Machine Learning (cs.LG) Cite as: arXiv:2512.16762 [cs.LG] (or arXiv:2512.16762v2 [cs.LG] for this version) https...