[2508.19073] CARMA: Collocation-Aware Resource Manager
Summary
CARMA is a collocation-aware resource manager designed to optimize GPU utilization for deep learning workloads while mitigating risks of out-of-memory crashes and performance interference.
Why It Matters
As deep learning tasks increasingly rely on GPU resources, efficient management of these resources is critical for improving performance and energy efficiency. CARMA addresses common challenges in GPU utilization, making it relevant for researchers and practitioners in the field of distributed computing and machine learning.
Key Takeaways
- CARMA enhances GPU utilization by 54% through informed collocation decisions.
- It reduces out-of-memory crashes and performance interference among tasks.
- The system achieves a 35% reduction in end-to-end execution time for deep learning workloads.
- Energy consumption is decreased by approximately 15% with CARMA's optimizations.
- Fine-grained monitoring and task placement policies are key features of CARMA.
Computer Science > Distributed, Parallel, and Cluster Computing arXiv:2508.19073 (cs) [Submitted on 26 Aug 2025 (v1), last revised 23 Feb 2026 (this version, v3)] Title:CARMA: Collocation-Aware Resource Manager Authors:Ehsan Yousefzadeh-Asl-Miandoab, Florina M. Ciorba, Pınar Tözün View a PDF of the paper titled CARMA: Collocation-Aware Resource Manager, by Ehsan Yousefzadeh-Asl-Miandoab and 2 other authors View PDF HTML (experimental) Abstract:GPUs running deep learning (DL) workloads are frequently underutilized. Collocating multiple DL training tasks on the same GPU can improve utilization but introduces two key risks: (1) out-of-memory (OOM) crashes for newly scheduled tasks, and (2) severe performance interference among co-running tasks, which can negate any throughput gains. These issues reduce system robustness, quality of service, and energy efficiency. We present CARMA, a task-level, collocation-aware resource manager for the server-scale. CARMA addresses collocation challenges via (1) fine-grained monitoring and bookkeeping of GPUs and a collocation risk analysis that filters out the high-risk GPUs; (2) task placement policies that cap GPU utilization to limit OOMs and interference; (3) integration of GPU memory need estimators for DL tasks to minimize OOMs during collocation; and (4) a lightweight recovery method that relaunches jobs crashed due to OOMs. Our evaluation on a DL training workload derived from real-world traces shows that CARMA uses GPUs more effici...