[2511.04934] Leak@$k$: Unlearning Does Not Make LLMs Forget Under Probabilistic Decoding

[2511.04934] Leak@$k$: Unlearning Does Not Make LLMs Forget Under Probabilistic Decoding

arXiv - Machine Learning 4 min read Article

Summary

The paper discusses the limitations of current unlearning methods in large language models (LLMs), revealing that they fail to effectively erase sensitive information when using probabilistic decoding. It introduces a new metric, leak@$k$, to evaluate unlearning reliability an...

Why It Matters

As LLMs become integral to applications involving sensitive data, ensuring they can forget information is crucial for compliance and ethical standards. This research highlights significant gaps in existing unlearning techniques, emphasizing the need for more robust solutions to protect user privacy and data integrity.

Key Takeaways

  • Current unlearning methods in LLMs are largely ineffective under probabilistic decoding.
  • The new leak@$k$ metric provides a systematic way to evaluate unlearning reliability.
  • The proposed RULE algorithm demonstrates improved performance in preventing knowledge leakage.

Computer Science > Machine Learning arXiv:2511.04934 (cs) [Submitted on 7 Nov 2025 (v1), last revised 21 Feb 2026 (this version, v2)] Title:Leak@$k$: Unlearning Does Not Make LLMs Forget Under Probabilistic Decoding Authors:Hadi Reisizadeh, Jiajun Ruan, Yiwei Chen, Soumyadeep Pal, Sijia Liu, Mingyi Hong View a PDF of the paper titled Leak@$k$: Unlearning Does Not Make LLMs Forget Under Probabilistic Decoding, by Hadi Reisizadeh and 5 other authors View PDF HTML (experimental) Abstract:Unlearning in large language models (LLMs) is critical for regulatory compliance and for building ethical generative AI systems that avoid producing private, toxic, illegal, or copyrighted content. Despite rapid progress, in this work we show that \textit{almost all} existing unlearning methods fail to achieve true forgetting in practice. Specifically, while evaluations of these `unlearned' models under deterministic (greedy) decoding often suggest successful knowledge removal using standard benchmarks (as has been done in the literature), we show that sensitive information reliably resurfaces when models are sampled with standard probabilistic decoding. To rigorously capture this vulnerability, we introduce \texttt{leak@$k$}, a new meta-evaluation metric that quantifies the likelihood of forgotten knowledge reappearing when generating $k$ samples from the model under realistic decoding strategies. Using three widely adopted benchmarks, TOFU, MUSE, and WMDP, we conduct the first large-scale, ...

Related Articles

Llms

[For Hire] Junior AI/ML Engineer | RAG · LLMs · FastAPI · Vector DBs | Remote

Posting this for a friend who isn't on Reddit. A recent graduate, entry level, no commercial production experience but spent the past yea...

Reddit - ML Jobs · 1 min ·
I Asked ChatGPT What WIRED’s Reviewers Recommend—Its Answers Were All Wrong | WIRED
Llms

I Asked ChatGPT What WIRED’s Reviewers Recommend—Its Answers Were All Wrong | WIRED

Want to know what our reviewers have actually tested and picked as the best TVs, headphones, and laptops? Ask ChatGPT, and it'll give you...

Wired - AI · 8 min ·
A Cross-Sectional Study Evaluating the Quality of AI-Generated Patient Education Guides on Diet and Exercise for Diabetes, Hypertension, and Obesity Using ChatGPT-4o, Google Gemini 1.5, Claude Sonnet 4, Perplexity, and Grok
Llms

A Cross-Sectional Study Evaluating the Quality of AI-Generated Patient Education Guides on Diet and Exercise for Diabetes, Hypertension, and Obesity Using ChatGPT-4o, Google Gemini 1.5, Claude Sonnet 4, Perplexity, and Grok

This study evaluates the quality of AI-generated patient education guides on diet and exercise for chronic conditions, comparing five lan...

AI Tools & Products · 2 min ·
Llms

Agents Can Now Propose and Deploy Their Own Code Changes

150 clones yesterday. 43 stars in 3 days. Every agent framework you've used (LangChain, LangGraph, Claude Code) assumes agents are tools ...

Reddit - Artificial Intelligence · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime