[2603.28823] Time is Not Compute: Scaling Laws for Wall-Clock Constrained Training on Consumer GPUs
About this article
Abstract page for arXiv paper 2603.28823: Time is Not Compute: Scaling Laws for Wall-Clock Constrained Training on Consumer GPUs
Computer Science > Performance arXiv:2603.28823 (cs) [Submitted on 29 Mar 2026] Title:Time is Not Compute: Scaling Laws for Wall-Clock Constrained Training on Consumer GPUs Authors:Yi Liu View a PDF of the paper titled Time is Not Compute: Scaling Laws for Wall-Clock Constrained Training on Consumer GPUs, by Yi Liu View PDF HTML (experimental) Abstract:Scaling laws relate model quality to compute budget (FLOPs), but practitioners face wall-clock time constraints, not compute budgets. We study optimal model sizing under fixed time budgets from 5 minutes to 24 hours on consumer GPUs (RTX 4090). Across 70+ runs spanning 50M--1031M parameters, we find: (1)~at each time budget a U-shaped curve emerges where too-small models overfit and too-large models undertrain; (2)~optimal model size follows $N^* \propto t^{0.60}$, growing \emph{faster} than Chinchilla's $N^* \propto C^{0.50}$, with $\alpha = 0.60 \pm 0.07$ robustly exceeding compute-optimal across all sensitivity analyses; (3)~a \emph{dual U-shape mechanism}: short-budget U-curves arise from compute bottlenecks, while long-budget U-curves emerge from data bottlenecks (overfitting), with an intermediate regime where the U-curve temporarily disappears. These findings have immediate implications for researchers training on consumer hardware, where wall-clock time -- not FLOPs -- is the binding constraint. We release all code, logs, and 70+ experimental configurations. Subjects: Performance (cs.PF); Artificial Intelligence (cs....