Llms Machine Learning Ai Infrastructure

[2602.05695] SweetSpot: An Analytical Model for Predicting Energy Efficiency of LLM Inference

arXiv - AI February 24, 2026 4 min read Article

Summary

The paper presents SweetSpot, an analytical model designed to predict energy efficiency in LLM inference, revealing optimal input-output length combinations for reduced energy consumption.

Why It Matters

As LLMs increasingly dominate AI workloads, understanding their energy efficiency is crucial for sustainable AI development. SweetSpot provides a framework for optimizing energy use, which can lead to significant cost savings and environmental benefits in data centers.

Key Takeaways

SweetSpot identifies optimal input-output length combinations for LLMs to maximize energy efficiency.
The model reveals a non-linear relationship between sequence lengths and energy consumption.
Aligning sequence lengths with efficiency sweet spots can reduce energy usage by up to 33.41x.
The model was validated using diverse LLMs on NVIDIA H100 GPUs, achieving a mean MAPE of 1.79%.
This framework can inform strategies for truncation, summarization, and adaptive generation in production systems.

Computer Science > Artificial Intelligence arXiv:2602.05695 (cs) [Submitted on 5 Feb 2026 (v1), last revised 23 Feb 2026 (this version, v2)] Title:SweetSpot: An Analytical Model for Predicting Energy Efficiency of LLM Inference Authors:Hiari Pizzini Cavagna, Andrea Proia, Giacomo Madella, Giovanni B. Esposito, Francesco Antici, Daniele Cesarini, Zeynep Kiziltan, Andrea Bartolini View a PDF of the paper titled SweetSpot: An Analytical Model for Predicting Energy Efficiency of LLM Inference, by Hiari Pizzini Cavagna and 7 other authors View PDF HTML (experimental) Abstract:Large Language Models (LLMs) inference is central to modern AI applications, dominating worldwide datacenter workloads, making it critical to predict its energy footprint. Existing approaches estimate energy consumption as a simple linear function of input and output sequence. However, by analyzing the autoregressive structure of Transformers, which implies a fundamentally non-linear relationship between input and output sequence lengths and energy consumption, we demonstrate the existence of a generation energy minima. Peak efficiency occurs with short-to-moderate inputs and medium-length outputs, while efficiency drops sharply for long inputs or very short outputs. Consequently, we propose SweetSpot, an analytical model derived from the computational and memory-access complexity of the Transformer architecture, which accurately characterizes the efficiency curve as a function of input and output lengths....

Read Original Article