[2602.05695] SweetSpot: An Analytical Model for Predicting Energy Efficiency of LLM Inference
Summary
The paper presents SweetSpot, an analytical model designed to predict energy efficiency in LLM inference, revealing optimal input-output length combinations for reduced energy consumption.
Why It Matters
As LLMs increasingly dominate AI workloads, understanding their energy efficiency is crucial for sustainable AI development. SweetSpot provides a framework for optimizing energy use, which can lead to significant cost savings and environmental benefits in data centers.
Key Takeaways
- SweetSpot identifies optimal input-output length combinations for LLMs to maximize energy efficiency.
- The model reveals a non-linear relationship between sequence lengths and energy consumption.
- Aligning sequence lengths with efficiency sweet spots can reduce energy usage by up to 33.41x.
- The model was validated using diverse LLMs on NVIDIA H100 GPUs, achieving a mean MAPE of 1.79%.
- This framework can inform strategies for truncation, summarization, and adaptive generation in production systems.
Computer Science > Artificial Intelligence arXiv:2602.05695 (cs) [Submitted on 5 Feb 2026 (v1), last revised 23 Feb 2026 (this version, v2)] Title:SweetSpot: An Analytical Model for Predicting Energy Efficiency of LLM Inference Authors:Hiari Pizzini Cavagna, Andrea Proia, Giacomo Madella, Giovanni B. Esposito, Francesco Antici, Daniele Cesarini, Zeynep Kiziltan, Andrea Bartolini View a PDF of the paper titled SweetSpot: An Analytical Model for Predicting Energy Efficiency of LLM Inference, by Hiari Pizzini Cavagna and 7 other authors View PDF HTML (experimental) Abstract:Large Language Models (LLMs) inference is central to modern AI applications, dominating worldwide datacenter workloads, making it critical to predict its energy footprint. Existing approaches estimate energy consumption as a simple linear function of input and output sequence. However, by analyzing the autoregressive structure of Transformers, which implies a fundamentally non-linear relationship between input and output sequence lengths and energy consumption, we demonstrate the existence of a generation energy minima. Peak efficiency occurs with short-to-moderate inputs and medium-length outputs, while efficiency drops sharply for long inputs or very short outputs. Consequently, we propose SweetSpot, an analytical model derived from the computational and memory-access complexity of the Transformer architecture, which accurately characterizes the efficiency curve as a function of input and output lengths....