Llms Machine Learning Ai Infrastructure

[2602.23036] LLMServingSim 2.0: A Unified Simulator for Heterogeneous and Disaggregated LLM Serving Infrastructure

arXiv - AI February 27, 2026 4 min read Article

Summary

LLMServingSim 2.0 introduces a unified simulator for heterogeneous and disaggregated large language model (LLM) serving infrastructures, enhancing performance analysis and system design.

Why It Matters

As LLM serving infrastructures evolve, understanding the interactions between diverse hardware and software components becomes crucial for optimizing performance. LLMServingSim 2.0 addresses this need by providing a comprehensive framework that allows researchers and engineers to explore and validate system designs effectively, promoting advancements in AI infrastructure.

Key Takeaways

LLMServingSim 2.0 models hardware-software interactions in LLM serving systems.
The simulator achieves high accuracy with an average error of 0.97% against real deployments.
It supports extensible integration of emerging accelerators and memory systems.
The unified framework enables efficient exploration of serving strategies and configurations.
Simulation times remain manageable at around 10 minutes for complex setups.

Computer Science > Distributed, Parallel, and Cluster Computing arXiv:2602.23036 (cs) [Submitted on 26 Feb 2026] Title:LLMServingSim 2.0: A Unified Simulator for Heterogeneous and Disaggregated LLM Serving Infrastructure Authors:Jaehong Cho, Hyunmin Choi, Guseul Heo, Jongse Park View a PDF of the paper titled LLMServingSim 2.0: A Unified Simulator for Heterogeneous and Disaggregated LLM Serving Infrastructure, by Jaehong Cho and 3 other authors View PDF HTML (experimental) Abstract:Large language model (LLM) serving infrastructures are undergoing a shift toward heterogeneity and disaggregation. Modern deployments increasingly integrate diverse accelerators and near-memory processing technologies, introducing significant hardware heterogeneity, while system software increasingly separates computation, memory, and model components across distributed resources to improve scalability and efficiency. As a result, LLM serving performance is no longer determined by hardware or software choices in isolation, but by their runtime interaction through scheduling, data movement, and interconnect behavior. However, understanding these interactions remains challenging, as existing simulators lack the ability to jointly model heterogeneous hardware and disaggregated serving techniques within a unified, runtime-driven framework. This paper presents LLMServingSim 2.0, a unified system-level simulator designed to make runtime-driven hardware-software interactions in heterogeneous and disagg...

Read Original Article