[2603.04822] VISA: Value Injection via Shielded Adaptation for Personalized LLM Alignment
About this article
Abstract page for arXiv paper 2603.04822: VISA: Value Injection via Shielded Adaptation for Personalized LLM Alignment
Computer Science > Artificial Intelligence arXiv:2603.04822 (cs) [Submitted on 5 Mar 2026] Title:VISA: Value Injection via Shielded Adaptation for Personalized LLM Alignment Authors:Jiawei Chen, Tianzhuo Yang, Guoxi Zhang, Jiaming Ji, Yaodong Yang, Juntao Dai View a PDF of the paper titled VISA: Value Injection via Shielded Adaptation for Personalized LLM Alignment, by Jiawei Chen and 5 other authors View PDF HTML (experimental) Abstract:Aligning Large Language Models (LLMs) with nuanced human values remains a critical challenge, as existing methods like Reinforcement Learning from Human Feedback (RLHF) often handle only coarse-grained attributes. In practice, fine-tuning LLMs on task-specific datasets to optimize value alignment inevitably incurs an alignment tax: the model's pre-calibrated value system drifts significantly due to latent bias absorption from training data, while the fine-tuning process also causes severe hallucinations and semantic information loss in generated responses. To address this, we propose VISA (Value Injection via Shielded Adaptation), a closed-loop framework designed to navigate this trade-off. VISA's architecture features a high-precision value detector, a semantic-to-value translator, and a core value-rewriter. The value-rewriter is trained via Group Relative Policy Optimization (GRPO) with a composite reward function that simultaneously optimizes for fine-grained value precision, and the preservation of semantic integrity. By learning an op...