[2603.02262] Silent Sabotage During Fine-Tuning: Few-Shot Rationale

[2603.02262] Silent Sabotage During Fine-Tuning: Few-Shot Rationale Poisoning of Compact Medical LLMs

arXiv - Machine Learning March 04, 2026 3 min read

About this article

Abstract page for arXiv paper 2603.02262: Silent Sabotage During Fine-Tuning: Few-Shot Rationale Poisoning of Compact Medical LLMs

Computer Science > Cryptography and Security arXiv:2603.02262 (cs) [Submitted on 28 Feb 2026] Title:Silent Sabotage During Fine-Tuning: Few-Shot Rationale Poisoning of Compact Medical LLMs Authors:Jingyuan Xie, Wenjie Wang, Ji Wu, Jiandong Gao View a PDF of the paper titled Silent Sabotage During Fine-Tuning: Few-Shot Rationale Poisoning of Compact Medical LLMs, by Jingyuan Xie and 3 other authors View PDF HTML (experimental) Abstract:Supervised fine-tuning (SFT) is essential for the development of medical large language models (LLMs), yet prior poisoning studies have mainly focused on the detectable backdoor attacks. We propose a novel poisoning attack targeting the reasoning process of medical LLMs during SFT. Unlike backdoor attacks, our method injects poisoned rationales into few-shot training data, leading to stealthy degradation of model performance on targeted medical topics. Results showed that knowledge overwriting was ineffective, while rationale poisoning caused significant decline on the accuracy of the target subject, as long as no correct samples of the same subject appear in the dataset. A minimum number and ratio of poisoned samples was needed to carry out an effective and stealthy attack, which was more efficient and accurate than catastrophic forgetting. We demonstrate though this study the risk of SFT-stage poisoning, hoping to spur more studies of defense in the sensitive medical domain. Subjects: Cryptography and Security (cs.CR); Artificial Intelligen...

Originally published on March 04, 2026. Curated by AI News.

Llms

8 free AI courses from Anthropic’s Claude platform with certificates

AI News - General · 19 minutes ago

Llms

Anthropic launches Claude Managed Agents — composable APIs for shipping production AI agents 10x faster. Notion, Rakuten, Asana, and Sentry already in production.

Anthropic launches Claude Managed Agents in public beta — composable APIs for shipping production AI agents 10x faster Handles sandboxing...

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

Llms

6 Months Using AI for Actual Work: What's Incredible, What's Overhyped, and What's Quietly Dangerous

Six months ago I committed to using AI tools for everything I possibly could in my work. Every day, every task, every workflow. Here's th...

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

Llms

Gemini gets major upgrade towards interactive AI learning

Google has updated its Gemini AI assistant to generate three-dimensional models and live simulations, allowing users to interact with com...

AI News - General · 3 min · about 2 hours ago

[2603.02262] Silent Sabotage During Fine-Tuning: Few-Shot Rationale Poisoning of Compact Medical LLMs

About this article

Related Articles

8 free AI courses from Anthropic’s Claude platform with certificates

Anthropic launches Claude Managed Agents — composable APIs for shipping production AI agents 10x faster. Notion, Rakuten, Asana, and Sentry already in production.

6 Months Using AI for Actual Work: What's Incredible, What's Overhyped, and What's Quietly Dangerous

Gemini gets major upgrade towards interactive AI learning

No comments

Stay updated with AI News