[2505.16723] LLM Fingerprinting via Semantically Conditioned Watermarks
Summary
The paper presents a novel method for LLM fingerprinting using semantically conditioned watermarks, enhancing robustness against common deployment challenges.
Why It Matters
As large language models (LLMs) become increasingly integrated into applications, ensuring their ownership and authenticity is crucial. This research addresses vulnerabilities in existing fingerprinting methods, providing a more reliable solution that can withstand typical model modifications.
Key Takeaways
- Introduces a new approach to LLM fingerprinting using semantic watermarks.
- Overcomes limitations of traditional fingerprinting methods that fail during model finetuning.
- Demonstrates robustness against common deployment scenarios through experimental evaluation.
- Offers a statistical watermarking signal instead of fixed atypical responses.
- Enhances model ownership verification within specific semantic domains.
Computer Science > Cryptography and Security arXiv:2505.16723 (cs) [Submitted on 22 May 2025 (v1), last revised 19 Feb 2026 (this version, v3)] Title:LLM Fingerprinting via Semantically Conditioned Watermarks Authors:Thibaud Gloaguen, Robin Staab, Nikola Jovanović, Martin Vechev View a PDF of the paper titled LLM Fingerprinting via Semantically Conditioned Watermarks, by Thibaud Gloaguen and 3 other authors View PDF HTML (experimental) Abstract:Most LLM fingerprinting methods teach the model to respond to a few fixed queries with predefined atypical responses (keys). This memorization often does not survive common deployment steps such as finetuning or quantization, and such keys can be easily detected and filtered from LLM responses, ultimately breaking the fingerprint. To overcome these limitations we introduce LLM fingerprinting via semantically conditioned watermarks, replacing fixed query sets with a broad semantic domain, and replacing brittle atypical keys with a statistical watermarking signal diffused throughout each response. After teaching the model to watermark its responses only to prompts from a predetermined domain e.g., French language, the model owner can use queries from that domain to reliably detect the fingerprint and verify ownership. As we confirm in our thorough experimental evaluation, our fingerprint is both stealthy and robust to all common deployment scenarios. Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG) Cite as: arXiv:...