[2603.01185] Token-level Data Selection for Safe LLM Fine-tuning
About this article
Abstract page for arXiv paper 2603.01185: Token-level Data Selection for Safe LLM Fine-tuning
Computer Science > Computation and Language arXiv:2603.01185 (cs) [Submitted on 1 Mar 2026] Title:Token-level Data Selection for Safe LLM Fine-tuning Authors:Yanping Li, Zhening Liu, Zijian Li, Zehong Lin, Jun Zhang View a PDF of the paper titled Token-level Data Selection for Safe LLM Fine-tuning, by Yanping Li and 4 other authors View PDF HTML (experimental) Abstract:Fine-tuning large language models (LLMs) on custom datasets has become a standard approach for adapting these models to specific domains and applications. However, recent studies have shown that such fine-tuning can lead to significant degradation in the model's safety. Existing defense methods operate at the sample level and often suffer from an unsatisfactory trade-off between safety and utility. To address this limitation, we perform a systematic token-level diagnosis of safety degradation during fine-tuning. Based on this, we propose token-level data selection for safe LLM fine-tuning (TOSS), a novel framework that quantifies the safety risk of each token by measuring the loss difference between a safety-degraded model and a utility-oriented model. This token-level granularity enables accurate identification and removal of unsafe tokens, thereby preserving valuable task-specific information. In addition, we introduce a progressive refinement strategy, TOSS-Pro, which iteratively enhances the safety-degraded model's ability to identify unsafe tokens. Extensive experiments demonstrate that our approach rob...