[2602.17510] LORA-CRAFT: Cross-layer Rank Adaptation via Frozen Tucker Decomposition of Pre-trained Attention Weights
Summary
The paper presents LORA-CRAFT, a novel parameter-efficient fine-tuning method that utilizes Tucker tensor decomposition on pre-trained attention weights, achieving competitive performance with fewer adaptation parameters.
Why It Matters
LORA-CRAFT addresses the growing need for efficient model fine-tuning in machine learning, particularly in natural language processing. By reducing the number of parameters required for adaptation, it enhances the feasibility of deploying large models in resource-constrained environments, making advanced AI more accessible.
Key Takeaways
- LORA-CRAFT uses Tucker decomposition to optimize fine-tuning of transformer models.
- The method adapts pre-trained weights with significantly fewer parameters compared to existing techniques.
- Experiments show that LORA-CRAFT performs competitively on the GLUE benchmark.
- This approach can facilitate the deployment of large models in environments with limited resources.
- The technique bridges existing methods in tensor-based parameter-efficient fine-tuning.
Computer Science > Machine Learning arXiv:2602.17510 (cs) [Submitted on 19 Feb 2026] Title:LORA-CRAFT: Cross-layer Rank Adaptation via Frozen Tucker Decomposition of Pre-trained Attention Weights Authors:Kasun Dewage, Marianna Pensky, Suranadi De Silva, Shankadeep Mondal View a PDF of the paper titled LORA-CRAFT: Cross-layer Rank Adaptation via Frozen Tucker Decomposition of Pre-trained Attention Weights, by Kasun Dewage and Marianna Pensky and Suranadi De Silva and Shankadeep Mondal View PDF HTML (experimental) Abstract:We introduce CRAFT (Cross-layer Rank Adaptation via Frozen Tucker), a parameter-efficient fine-tuning (PEFT) method that applies Tucker tensor decomposition to pre-trained attention weight matrices stacked across transformer layers and trains only small square adaptation matrices on the resulting frozen Tucker factors. Existing tensor-based PEFT methods decompose gradient updates: LoTR applies Tucker decomposition with shared factor matrices, while SuperLoRA groups and reshapes $\Delta W$ across layers before applying Tucker decomposition. Separately, methods like PiSSA apply SVD to pre-trained weights but operate independently per layer. CRAFT bridges these two lines of work: it performs full Tucker decomposition via Higher-Order SVD (HOSVD) directly on pre-trained weights organized as cross-layer 3D tensors, freezes all resulting factors, and adapts the model through lightweight trainable transformations applied to each factor matrix. Experiments on the ...