[2602.16456] Beyond SGD, Without SVD: Proximal Subspace Iteration LoRA with Diagonal Fractional K-FAC

[2602.16456] Beyond SGD, Without SVD: Proximal Subspace Iteration LoRA with Diagonal Fractional K-FAC

arXiv - Machine Learning 4 min read Article

Summary

This paper presents a novel approach called LoRSum for optimizing Low-Rank Adaptation (LoRA) in machine learning, enhancing efficiency in training large models without full-matrix SVD projections.

Why It Matters

As machine learning models grow in complexity, efficient training methods become crucial. This research addresses the limitations of existing LoRA techniques, offering a more memory-efficient alternative that can improve performance while reducing computational overhead.

Key Takeaways

  • LoRSum optimizes LoRA by treating it as a proximal sub-problem.
  • The method achieves efficiency without relying on full-matrix SVD projections.
  • Experiments show that LoRSum can match or exceed LoRA baselines with modest compute costs.
  • The approach incorporates structured metrics like K-FAC for improved performance.
  • LoRSum can also update low-rank momentum, enhancing its versatility.

Computer Science > Machine Learning arXiv:2602.16456 (cs) [Submitted on 18 Feb 2026] Title:Beyond SGD, Without SVD: Proximal Subspace Iteration LoRA with Diagonal Fractional K-FAC Authors:Abdulla Jasem Almansoori, Maria Ivanova, Andrey Veprikov, Aleksandr Beznosikov, Samuel Horváth, Martin Takáč View a PDF of the paper titled Beyond SGD, Without SVD: Proximal Subspace Iteration LoRA with Diagonal Fractional K-FAC, by Abdulla Jasem Almansoori and Maria Ivanova and Andrey Veprikov and Aleksandr Beznosikov and Samuel Horv\'ath and Martin Tak\'a\v{c} View PDF HTML (experimental) Abstract:Low-Rank Adaptation (LoRA) fine-tunes large models by learning low-rank updates on top of frozen weights, dramatically reducing trainable parameters and memory. In this work, we address the gap between training with full steps with low-rank projections (SVDLoRA) and LoRA fine-tuning. We propose LoRSum, a memory-efficient subroutine that closes this gap for gradient descent by casting LoRA optimization as a proximal sub-problem and solving it efficiently with alternating least squares updates, which we prove to be an implicit block power method. We recover several recently proposed preconditioning methods for LoRA as special cases, and show that LoRSum can also be used for updating a low-rank momentum. In order to address full steps with preconditioned gradient descent, we propose a scaled variant of LoRSum that uses structured metrics such as K-FAC and Shampoo, and we show that storing the dia...

Related Articles

Machine Learning

Free tool I built to score dataset quality (LQS) — feedback welcome [D]

We built a Label Quality Score (LQS) system for our dataset marketplace and opened it up as a free standalone tool. Upload a dataset → ge...

Reddit - Machine Learning · 1 min ·
Meta’s New AI Model Gives Mark Zuckerberg a Seat at the Big Kid’s Table | WIRED
Machine Learning

Meta’s New AI Model Gives Mark Zuckerberg a Seat at the Big Kid’s Table | WIRED

Muse Spark is Meta’s first model since its AI reboot, and the benchmarks suggest formidable performance.

Wired - AI · 6 min ·
Machine Learning

Project Glasswing is inherently Cartel Behaviour

If the large companies always get access to the latest models first to "sure up cybersecurity" they will always have a head start on the ...

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

ICML 2026 am I cooked? [D]

Hi, I am currently making the jump to ML from theoretical physics. I just got done with the review period, went from 4333 to 4433, but th...

Reddit - Machine Learning · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime