[2603.22943] PersonalQ: Select, Quantize, and Serve Personalized Diffusion Models for Efficient Inference
About this article
Abstract page for arXiv paper 2603.22943: PersonalQ: Select, Quantize, and Serve Personalized Diffusion Models for Efficient Inference
Computer Science > Artificial Intelligence arXiv:2603.22943 (cs) [Submitted on 24 Mar 2026] Title:PersonalQ: Select, Quantize, and Serve Personalized Diffusion Models for Efficient Inference Authors:Qirui Wang, Qi Guo, Yiding Sun, Junkai Yang, Dongxu Zhang, Shanmin Pang, Qing Guo View a PDF of the paper titled PersonalQ: Select, Quantize, and Serve Personalized Diffusion Models for Efficient Inference, by Qirui Wang and 6 other authors View PDF HTML (experimental) Abstract:Personalized text-to-image generation lets users fine-tune diffusion models into repositories of concept-specific checkpoints, but serving these repositories efficiently is difficult for two reasons: natural-language requests are often ambiguous and can be misrouted to visually similar checkpoints, and standard post-training quantization can distort the fragile representations that encode personalized concepts. We present PersonalQ, a unified framework that connects checkpoint selection and quantization through a shared signal -- the checkpoint's trigger token. Check-in performs intent-aligned selection by combining intent-aware hybrid retrieval with LLM-based reranking over checkpoint context and asks a brief clarification question only when multiple intents remain plausible; it then rewrites the prompt by inserting the selected checkpoint's canonical trigger. Complementing this, Trigger-Aware Quantization (TAQ) applies trigger-aware mixed precision in cross-attention, preserving trigger-conditioned key...