[2501.07237] Gradient Compression Beyond Low-Rank: Wavelet Subspaces

[2501.07237] Gradient Compression Beyond Low-Rank: Wavelet Subspaces Compact Optimizer States

arXiv - Machine Learning March 31, 2026 4 min read

About this article

Abstract page for arXiv paper 2501.07237: Gradient Compression Beyond Low-Rank: Wavelet Subspaces Compact Optimizer States

Computer Science > Machine Learning arXiv:2501.07237 (cs) [Submitted on 13 Jan 2025 (v1), last revised 30 Mar 2026 (this version, v4)] Title:Gradient Compression Beyond Low-Rank: Wavelet Subspaces Compact Optimizer States Authors:Ziqing Wen, Ping Luo, Jiahuan Wang, Kun Yuan, Dongsheng Li, Tao Sun View a PDF of the paper titled Gradient Compression Beyond Low-Rank: Wavelet Subspaces Compact Optimizer States, by Ziqing Wen and 5 other authors View PDF HTML (experimental) Abstract:Large language models (LLMs) have shown impressive performance across a range of natural language processing tasks. However, their vast number of parameters introduces significant memory challenges during training, particularly when using memory-intensive optimizers like Adam. Existing memory-efficient algorithms often rely on techniques such as singular value decomposition projection or weight freezing. While these approaches help alleviate memory constraints, they generally produce suboptimal results compared to full-rank updates. In this paper, we investigate the memory-efficient method beyond low-rank training, proposing a novel solution called Gradient Wavelet Transform (GWT), which applies wavelet transforms to gradients in order to significantly reduce the memory requirements for maintaining optimizer states. We demonstrate that GWT can be seamlessly integrated with memory-intensive optimizers, enabling efficient training while maintaining performance. Through extensive experiments on both pr...

Originally published on March 31, 2026. Curated by AI News.

Llms

Anyone here using local models mainly to keep LLM costs under control?

Been noticing that once you use LLMs for real dev work, the cost conversation gets messy fast. It is not just raw API spend. It is retrie...

Reddit - Artificial Intelligence · 1 min · 6 minutes ago

Llms

Claude AI Goes Down for Thousands of Users Wednesday, Downdetector Reports

Claude AI faces an outage today as over 7,000 users report issues. Stay informed about the situation here.

AI Tools & Products · 6 min · 36 minutes ago

Llms

ChatGPT meets coffee: Starbucks launches AI ordering tool

Starbucks has launched an AI ordering tool that integrates with ChatGPT, aiming to improve the customer experience by streamlining the or...

AI Tools & Products · 1 min · 36 minutes ago

Llms

NFL mock draft 2026: ChatGPT AI gives the worst predictions you'll ever see

USA TODAY Sports features a mock draft for the 2026 NFL Draft created by ChatGPT AI, which is noted for being the worst mock draft ever p...

AI Tools & Products · 9 min · 36 minutes ago

[2501.07237] Gradient Compression Beyond Low-Rank: Wavelet Subspaces Compact Optimizer States

About this article

Related Articles

Anyone here using local models mainly to keep LLM costs under control?

Claude AI Goes Down for Thousands of Users Wednesday, Downdetector Reports

ChatGPT meets coffee: Starbucks launches AI ordering tool

NFL mock draft 2026: ChatGPT AI gives the worst predictions you'll ever see

No comments

Stay updated with AI News