[2605.07985] Dooly: Configuration-Agnostic, Redundancy-Aware Profiling for LLM Inference Simulation
About this article
Abstract page for arXiv paper 2605.07985: Dooly: Configuration-Agnostic, Redundancy-Aware Profiling for LLM Inference Simulation
Computer Science > Distributed, Parallel, and Cluster Computing arXiv:2605.07985 (cs) [Submitted on 8 May 2026] Title:Dooly: Configuration-Agnostic, Redundancy-Aware Profiling for LLM Inference Simulation Authors:Joon Ha Kim, Geon-Woo Kim, Anoop Rachakonda, Daehyeok Kim View a PDF of the paper titled Dooly: Configuration-Agnostic, Redundancy-Aware Profiling for LLM Inference Simulation, by Joon Ha Kim and 3 other authors View PDF HTML (experimental) Abstract:Selecting the optimal LLM inference configuration requires evaluation across hardware, serving engines, attention backends, and model architectures, since no single choice performs best across all workloads. Profile-based simulators are the standard tool, yet they hardcode their operation set to a specific configuration and re-profile every operation from scratch, making exploration prohibitively expensive. This cost stems from a missing structural understanding: every input dimension of each operation is fixed by the model configuration or determined by the incoming request. Many model-configuration values (e.g., head size, layer count) recur across models, so the same operation runs in many configurations; a single sweep over the request-dependent dimensions can serve them all. We present Dooly, which exploits this structure to achieve configuration-agnostic, redundancy-aware profiling. Dooly performs a single inference pass, labels each input dimension with its origin via taint propagation, and selectively profiles ...