[2602.13073] LCSB: Layer-Cyclic Selective Backpropagation for Memory-Efficient On-Device LLM Fine-Tuning

[2602.13073] LCSB: Layer-Cyclic Selective Backpropagation for Memory-Efficient On-Device LLM Fine-Tuning

arXiv - Machine Learning 3 min read Article

Summary

The paper presents Layer-Cyclic Selective Backpropagation (LCSB), a method for memory-efficient fine-tuning of large language models (LLMs) on mobile devices, achieving significant speedups with minimal quality loss.

Why It Matters

As mobile devices increasingly utilize large language models, optimizing memory and computational efficiency is crucial. LCSB offers a novel approach to enhance performance without compromising model quality, making advanced AI more accessible on resource-constrained devices.

Key Takeaways

  • LCSB computes gradients for a subset of layers, reducing computational load.
  • Achieves up to 1.40x speedup with less than 2% quality degradation.
  • Demonstrates improved stability in 4-bit quantized settings compared to full backpropagation.

Computer Science > Machine Learning arXiv:2602.13073 (cs) [Submitted on 13 Feb 2026] Title:LCSB: Layer-Cyclic Selective Backpropagation for Memory-Efficient On-Device LLM Fine-Tuning Authors:Juneyoung Park, Eunbeen Yoon, Seongwan Kim. Jaeho Lee View a PDF of the paper titled LCSB: Layer-Cyclic Selective Backpropagation for Memory-Efficient On-Device LLM Fine-Tuning, by Juneyoung Park and 2 other authors View PDF HTML (experimental) Abstract:Memory-efficient backpropagation (MeBP) has enabled first-order fine-tuning of large language models (LLMs) on mobile devices with less than 1GB memory. However, MeBP requires backward computation through all transformer layers at every step, where weight decompression alone accounts for 32--42% of backward time. We propose Layer-Cyclic Selective Backpropagation (LCSB), which computes gradients for only a subset of layers per step. Our key insight is that residual connections guarantee gradient flow through identity paths, while AdamW momentum provides implicit updates for non-selected layers. We interpret LCSB as Block Coordinate Descent on the LoRA parameter space, providing theoretical justification for convergence. LCSB achieves up to 1.40$\times$ speedup with less than 2\% quality degradation across five models and three tasks. Surprisingly, in 4-bit quantized settings, LCSB exhibits superior stability: a 3B model that completely diverges under full backpropagation converges smoothly with LCSB, suggesting an implicit regularization...

Related Articles

AI: Fragility of today's Claude Cowork type AI Agent Apps. RTZ 1061
Llms

AI: Fragility of today's Claude Cowork type AI Agent Apps. RTZ 1061

...realities like memory management, highlight a longer road to resilient AI Agents and AGI

AI Tools & Products · 11 min ·
Llms

Gemini caught a $280M crypto exploit before it hit the news, then retracted it as a hallucination because I couldn't verify it - because the news hadn't dropped yet

So this happened mere hours ago and I feel like I genuinely stumbled onto something worth documenting for people interested in AI behavio...

Reddit - Artificial Intelligence · 1 min ·
Llms

GPT-4 vs Claude vs Gemini for coding — honest breakdown after 3 months of daily use

I am a solo developer who has been using all three seriously. Here is what I actually think: GPT-4o — Strengths: Large context window, st...

Reddit - Artificial Intelligence · 1 min ·
Llms

You're giving feedback on a new version of ChatGPT

So I will be paying attention to these system messages more now- the last time I got one of these not so long back the 'tone' changed to ...

Reddit - Artificial Intelligence · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime