[2604.05502] AttnDiff: Attention-based Differential Fingerprinting for Large Language Models
About this article
Abstract page for arXiv paper 2604.05502: AttnDiff: Attention-based Differential Fingerprinting for Large Language Models
Computer Science > Cryptography and Security arXiv:2604.05502 (cs) [Submitted on 7 Apr 2026] Title:AttnDiff: Attention-based Differential Fingerprinting for Large Language Models Authors:Haobo Zhang, Zhenhua Xu, Junxian Li, Shangfeng Sheng, Dezhang Kong, Meng Han View a PDF of the paper titled AttnDiff: Attention-based Differential Fingerprinting for Large Language Models, by Haobo Zhang and 5 other authors View PDF HTML (experimental) Abstract:Protecting the intellectual property of open-weight large language models (LLMs) requires verifying whether a suspect model is derived from a victim model despite common laundering operations such as fine-tuning (including PPO/DPO), pruning/compression, and model merging. We propose \textsc{AttnDiff}, a data-efficient white-box framework that extracts fingerprints from models via intrinsic information-routing behavior. \textsc{AttnDiff} probes minimally edited prompt pairs that induce controlled semantic conflicts, captures differential attention patterns, summarizes them with compact spectral descriptors, and compares models using CKA. Across Llama-2/3 and Qwen2.5 (3B--14B) and additional open-source families, it yields high similarity for related derivatives while separating unrelated model families (e.g., $>0.98$ vs.\ $<0.22$ with $M=60$ probes). With 5--60 multi-domain probes, it supports practical provenance verification and accountability. Comments: Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG) Cite as:...