[2602.05695] SweetSpot: An Analytical Model for Predicting Energy Efficiency of LLM Inference

[2602.05695] SweetSpot: An Analytical Model for Predicting Energy Efficiency of LLM Inference

arXiv - AI 4 min read Article

Summary

The paper presents SweetSpot, an analytical model designed to predict energy efficiency in LLM inference, revealing optimal input-output length combinations for reduced energy consumption.

Why It Matters

As LLMs increasingly dominate AI workloads, understanding their energy efficiency is crucial for sustainable AI development. SweetSpot provides a framework for optimizing energy use, which can lead to significant cost savings and environmental benefits in data centers.

Key Takeaways

  • SweetSpot identifies optimal input-output length combinations for LLMs to maximize energy efficiency.
  • The model reveals a non-linear relationship between sequence lengths and energy consumption.
  • Aligning sequence lengths with efficiency sweet spots can reduce energy usage by up to 33.41x.
  • The model was validated using diverse LLMs on NVIDIA H100 GPUs, achieving a mean MAPE of 1.79%.
  • This framework can inform strategies for truncation, summarization, and adaptive generation in production systems.

Computer Science > Artificial Intelligence arXiv:2602.05695 (cs) [Submitted on 5 Feb 2026 (v1), last revised 23 Feb 2026 (this version, v2)] Title:SweetSpot: An Analytical Model for Predicting Energy Efficiency of LLM Inference Authors:Hiari Pizzini Cavagna, Andrea Proia, Giacomo Madella, Giovanni B. Esposito, Francesco Antici, Daniele Cesarini, Zeynep Kiziltan, Andrea Bartolini View a PDF of the paper titled SweetSpot: An Analytical Model for Predicting Energy Efficiency of LLM Inference, by Hiari Pizzini Cavagna and 7 other authors View PDF HTML (experimental) Abstract:Large Language Models (LLMs) inference is central to modern AI applications, dominating worldwide datacenter workloads, making it critical to predict its energy footprint. Existing approaches estimate energy consumption as a simple linear function of input and output sequence. However, by analyzing the autoregressive structure of Transformers, which implies a fundamentally non-linear relationship between input and output sequence lengths and energy consumption, we demonstrate the existence of a generation energy minima. Peak efficiency occurs with short-to-moderate inputs and medium-length outputs, while efficiency drops sharply for long inputs or very short outputs. Consequently, we propose SweetSpot, an analytical model derived from the computational and memory-access complexity of the Transformer architecture, which accurately characterizes the efficiency curve as a function of input and output lengths....

Related Articles

Llms

Claude Max 20x usage hit 40% by Monday noon — how does Codex CLI compare?

I'm on Claude Max (the $100/mo plan) and noticed something that surprised me. By Monday noon I had already used 40% of the 20x monthly li...

Reddit - Artificial Intelligence · 1 min ·
How to use the new ChatGPT app integrations, including DoorDash, Spotify, Uber, and others | TechCrunch
Llms

How to use the new ChatGPT app integrations, including DoorDash, Spotify, Uber, and others | TechCrunch

Learn how to use Spotify, Canva, Figma, Expedia, and other apps directly in ChatGPT.

TechCrunch - AI · 10 min ·
Anthropic Restricts Claude Agent Access Amid AI Automation Boom in Crypto
Llms

Anthropic Restricts Claude Agent Access Amid AI Automation Boom in Crypto

AI Tools & Products · 7 min ·
Is cutting ‘please’ when talking to ChatGPT better for the planet? An expert explains
Llms

Is cutting ‘please’ when talking to ChatGPT better for the planet? An expert explains

AI Tools & Products · 5 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime