[2603.29535] Quantization with Unified Adaptive Distillation to enable multi-LoRA based one-for-all Generative Vision Models on edge
About this article
Abstract page for arXiv paper 2603.29535: Quantization with Unified Adaptive Distillation to enable multi-LoRA based one-for-all Generative Vision Models on edge
Computer Science > Computer Vision and Pattern Recognition arXiv:2603.29535 (cs) [Submitted on 31 Mar 2026] Title:Quantization with Unified Adaptive Distillation to enable multi-LoRA based one-for-all Generative Vision Models on edge Authors:Sowmya Vajrala, Aakash Parmar, Prasanna R, Sravanth Kodavanti, Manjunath Arveti, Srinivas Soumitri Miriyala, Ashok Senapati View a PDF of the paper titled Quantization with Unified Adaptive Distillation to enable multi-LoRA based one-for-all Generative Vision Models on edge, by Sowmya Vajrala and 6 other authors View PDF HTML (experimental) Abstract:Generative Artificial Intelligence (GenAI) features such as image editing, object removal, and prompt-guided image transformation are increasingly integrated into mobile applications. However, deploying Large Vision Models (LVMs) for such tasks on resource-constrained devices remains challenging due to their high memory and compute requirements. While Low-Rank Adapters (LoRAs) enable parameter-efficient task adaptation, existing Mobile deployment pipelines typically compile separate model binaries for each LoRA + a copy of the foundation model, resulting in redundant storage and increased runtime overhead. In this work, we present a unified framework for enabling multi-task GenAI inference on edge devices using a single shared model. Our key idea is to treat LoRA weights as runtime inputs rather than embedding them into the compiled model graph, allowing dynamic task switching at runtime wi...