[2601.17551] GreenServ: Energy-Efficient Context-Aware Dynamic Routing for Multi-Model LLM Inference
About this article
Abstract page for arXiv paper 2601.17551: GreenServ: Energy-Efficient Context-Aware Dynamic Routing for Multi-Model LLM Inference
Computer Science > Performance arXiv:2601.17551 (cs) [Submitted on 24 Jan 2026 (v1), last revised 27 Feb 2026 (this version, v2)] Title:GreenServ: Energy-Efficient Context-Aware Dynamic Routing for Multi-Model LLM Inference Authors:Thomas Ziller, Shashikant Ilager, Alessandro Tundo, Ezio Bartocci, Leonardo Mariani, Ivona Brandic View a PDF of the paper titled GreenServ: Energy-Efficient Context-Aware Dynamic Routing for Multi-Model LLM Inference, by Thomas Ziller and 5 other authors View PDF HTML (experimental) Abstract:Large language models (LLMs) demonstrate remarkable capabilities, but their broad deployment is limited by significant computational resource demands, particularly energy consumption during inference. Static, one-model-fits-all inference strategies are often inefficient, as they do not exploit the diverse range of available models or adapt to varying query requirements. This paper presents GreenServ, a dynamic, context-aware routing framework that optimizes the trade-off between inference accuracy and energy efficiency. GreenServ extracts lightweight contextual features from each query, including task type, semantic cluster, and text complexity, and routes queries to the most suitable model from a heterogeneous pool, based on observed accuracy and energy usage. We employ a multi-armed bandit approach to learn adaptive routing policies online. This approach operates under partial feedback, eliminates the need for extensive offline calibration, and streamline...