[2602.09345] AgentCgroup: Understanding and Controlling OS Resources of AI Agents
Summary
The paper presents AgentCgroup, a resource management system for AI agents in cloud environments, addressing OS-level resource dynamics and inefficiencies.
Why It Matters
As AI agents become prevalent in multi-tenant cloud settings, understanding their resource demands is crucial for optimizing performance and resource allocation. This research highlights significant inefficiencies in current resource management strategies and proposes a novel solution that could enhance AI agent performance and reduce waste.
Key Takeaways
- OS-level execution accounts for 56-74% of task latency in AI agents.
- Memory is identified as the primary bottleneck, not CPU.
- Current resource controls mismatch the needs of AI agents, leading to inefficiencies.
- AgentCgroup utilizes eBPF for adaptive resource management based on agent needs.
- Preliminary evaluations show improved isolation and reduced resource waste.
Computer Science > Operating Systems arXiv:2602.09345 (cs) [Submitted on 10 Feb 2026 (v1), last revised 21 Feb 2026 (this version, v2)] Title:AgentCgroup: Understanding and Controlling OS Resources of AI Agents Authors:Yusheng Zheng, Jiakun Fan, Quanzhi Fu, Yiwei Yang, Wei Zhang, Andi Quinn View a PDF of the paper titled AgentCgroup: Understanding and Controlling OS Resources of AI Agents, by Yusheng Zheng and 5 other authors View PDF HTML (experimental) Abstract:AI agents are increasingly deployed in multi-tenant cloud environments, where they execute diverse tool calls within sandboxed containers, each call with distinct resource demands and rapid fluctuations. We present a systematic characterization of OS-level resource dynamics in sandboxed AI coding agents, analyzing 144 software engineering tasks from the SWE-rebench benchmark across two LLM models. Our measurements reveal that (1) OS-level execution (tool calls, container and agent initialization) accounts for 56-74% of end-to-end task latency; (2) memory, not CPU, is the concurrency bottleneck; (3) memory spikes are tool-call-driven with a up to 15.4x peak-to-average ratio; and (4) resource demands are highly unpredictable across tasks, runs, and models. Comparing these characteristics against serverless, microservice, and batch workloads, we identify three mismatches in existing resource controls: a granularity mismatch (container-level policies vs. tool-call-level dynamics), a responsiveness mismatch (user-space...