[2604.00733] Spectral Compact Training: Pre-Training Large Language

[2604.00733] Spectral Compact Training: Pre-Training Large Language Models via Permanent Truncated SVD and Stiefel QR Retraction

arXiv - AI April 02, 2026 4 min read

About this article

Abstract page for arXiv paper 2604.00733: Spectral Compact Training: Pre-Training Large Language Models via Permanent Truncated SVD and Stiefel QR Retraction

Computer Science > Machine Learning arXiv:2604.00733 (cs) [Submitted on 1 Apr 2026] Title:Spectral Compact Training: Pre-Training Large Language Models via Permanent Truncated SVD and Stiefel QR Retraction Authors:Björn Roman Kohlberger (EctoSpace, Dublin, Ireland) View a PDF of the paper titled Spectral Compact Training: Pre-Training Large Language Models via Permanent Truncated SVD and Stiefel QR Retraction, by Bj\"orn Roman Kohlberger (EctoSpace and 2 other authors View PDF HTML (experimental) Abstract:The memory wall remains the primary bottleneck for training large language models on consumer hardware. We introduce Spectral Compact Training (SCT), a method that replaces dense weight matrices with permanent truncated SVD factors W = U diag(s) V^T, where the full dense matrix is never materialized during training or inference. Gradients flow through the compact spectral factors via standard backpropagation, and U, V are retracted to the Stiefel manifold via QR decomposition after each optimizer step. SCT achieves up to 199x memory reduction per MLP layer at rank 32, enabling full training steps of 70B-parameter architectures on a Steam Deck handheld (7.2 GB peak memory vs. 1,245 GB for dense FP32 training with Adam). Rank-sweep experiments on SmolLM2-1.7B (ranks 32-256, 2000 steps, NVIDIA A100) show that all tested ranks converge to the same loss floor (~4.2-4.5), identifying the learning rate schedule -- not MLP rank -- as the primary bottleneck. Rank 128 emerges as th...

Originally published on April 02, 2026. Curated by AI News.

Llms

Agents that write their own code at runtime and vote on capabilities, no human in the loop

hollowOS just hit v4.4 and I added something that I haven’t seen anyone else do. Previous versions gave you an OS for agents: structured ...

Reddit - Artificial Intelligence · 1 min · 7 minutes ago

Llms

Google Maps can now write captions for your photos using AI | TechCrunch

Gemini can now create captions when users are looking to share a photo or video.

TechCrunch - AI · 4 min · about 1 hour ago

Llms

ParetoBandit: Budget-Paced Adaptive Routing for Non-Stationary LLM Serving

submitted by /u/PatienceHistorical70 [link] [comments]

Reddit - Machine Learning · 1 min · about 3 hours ago

Llms

Stop Overcomplicating AI Workflows. This Is the Simple Framework

I’ve been working on building an agentic AI workflow system for business use cases and one thing became very clear very quickly. This is ...

Reddit - Artificial Intelligence · 1 min · about 5 hours ago

[2604.00733] Spectral Compact Training: Pre-Training Large Language Models via Permanent Truncated SVD and Stiefel QR Retraction

About this article

Related Articles

Agents that write their own code at runtime and vote on capabilities, no human in the loop

Google Maps can now write captions for your photos using AI | TechCrunch

ParetoBandit: Budget-Paced Adaptive Routing for Non-Stationary LLM Serving

Stop Overcomplicating AI Workflows. This Is the Simple Framework

No comments

Stay updated with AI News