[R] Spectral Compact Training: 172x memory reduction for 70B model training - verified on a Steam Deck (7.24 GB)
About this article
This is a research article about a patent I filed (not self promotion). I am dyslexic so I used AI to help with the writing. I have been working on Spectral Compact Training (SCT). It stores every weight matrix as [ W = U \operatorname{diag}(s) VT ] and trains directly through the small spectral factors. Never builds the dense matrix. Exact gradients via standard backprop. QR retraction keeps U and V orthonormal after each optimizer step. Results on a 70B-class architecture (80 layers, hidden...
You've been blocked by network security.To continue, log in to your Reddit account or use your developer tokenIf you think you've been blocked by mistake, file a ticket below and we'll look into it.Log in File a ticket