[2603.28795] StepCache: Step-Level Reuse with Lightweight Verification and Selective Patching for LLM Serving
About this article
Abstract page for arXiv paper 2603.28795: StepCache: Step-Level Reuse with Lightweight Verification and Selective Patching for LLM Serving
Computer Science > Operating Systems arXiv:2603.28795 (cs) [Submitted on 24 Mar 2026] Title:StepCache: Step-Level Reuse with Lightweight Verification and Selective Patching for LLM Serving Authors:Azam Nouri View a PDF of the paper titled StepCache: Step-Level Reuse with Lightweight Verification and Selective Patching for LLM Serving, by Azam Nouri View PDF HTML (experimental) Abstract:We address LLM serving workloads where repeated requests share a common solution structure but differ in localized constraints, such as output schema, variable names, or numeric constants. Prior caching approaches typically reuse either full responses (semantic caching) or model-internal KV/prefix states, which are respectively brittle under partial changes or tightly coupled to specific backends. We present StepCache, a backend-agnostic step-level reuse layer that segments outputs into ordered steps, retrieves the best-matching cached request, verifies steps using lightweight task-aware checks, and regenerates only failing regions via selective patching. StepCache additionally supports strict structured-output enforcement for JSON, including single-step extraction, required-key constraints, and one-shot repair, as well as conservative skip-reuse fallbacks for semantic changes. For linear equations, StepCache promotes verification into correction via a bounded repair loop with a deterministic fallback that guarantees correctness when the backend model fails. In a CPU-only perturbation-heavy ...