[2602.21189] Why Pass@k Optimization Can Degrade Pass@1: Prompt Interference in LLM Post-training

[2602.21189] Why Pass@k Optimization Can Degrade Pass@1: Prompt Interference in LLM Post-training

arXiv - Machine Learning 4 min read Article

Summary

The paper explores the trade-off between Pass@k and Pass@1 performance metrics in large language models, revealing how optimizing for Pass@k can negatively impact Pass@1 due to prompt interference.

Why It Matters

Understanding the relationship between Pass@k and Pass@1 is crucial for optimizing large language models, especially in applications where single-shot accuracy is essential. This research highlights the complexities of model fine-tuning and the potential pitfalls of current optimization strategies.

Key Takeaways

  • Pass@k optimization can lead to a decrease in Pass@1 performance.
  • The trade-off is significant for applications requiring reliable single-shot responses.
  • Prompt interference is a key factor in the observed performance degradation.
  • The study provides a theoretical framework for understanding these dynamics.
  • Experiments validate the theoretical findings in the context of mathematical reasoning tasks.

Computer Science > Machine Learning arXiv:2602.21189 (cs) [Submitted on 24 Feb 2026] Title:Why Pass@k Optimization Can Degrade Pass@1: Prompt Interference in LLM Post-training Authors:Anas Barakat, Souradip Chakraborty, Khushbu Pahwa, Amrit Singh Bedi View a PDF of the paper titled Why Pass@k Optimization Can Degrade Pass@1: Prompt Interference in LLM Post-training, by Anas Barakat and 3 other authors View PDF HTML (experimental) Abstract:Pass@k is a widely used performance metric for verifiable large language model tasks, including mathematical reasoning, code generation, and short-answer reasoning. It defines success if any of $k$ independently sampled solutions passes a verifier. This multi-sample inference metric has motivated inference-aware fine-tuning methods that directly optimize pass@$k$. However, prior work reports a recurring trade-off: pass@k improves while pass@1 degrades under such methods. This trade-off is practically important because pass@1 often remains a hard operational constraint due to latency and cost budgets, imperfect verifier coverage, and the need for a reliable single-shot fallback. We study the origin of this trade-off and provide a theoretical characterization of when pass@k policy optimization can reduce pass@1 through gradient conflict induced by prompt interference. We show that pass@$k$ policy gradients can conflict with pass@1 gradients because pass@$k$ optimization implicitly reweights prompts toward low-success prompts; when these pro...

Related Articles

Llms

"Oops! ChatGPT is Temporarily Unavailable!": A Diary Study on Knowledge Workers' Experiences of LLM Withdrawal

submitted by /u/Special-Steel [link] [comments]

Reddit - Artificial Intelligence · 1 min ·
Llms

I built a Star Trek LCARS terminal that reads your entire AI coding setup

Side project that got out of hand. It's a dashboard for Claude Code that scans your ~/.claude/ directory and renders everything as a TNG ...

Reddit - Artificial Intelligence · 1 min ·
Llms

[R] Is autoresearch really better than classic hyperparameter tuning?

We did experiments comparing Optuna & autoresearch. Autoresearch converges faster, is more cost-efficient, and even generalizes bette...

Reddit - Machine Learning · 1 min ·
Llms

Claude Source Code?

Has anyone been able to successfully download the leaked source code yet? I've not been able to find it. If anyone has, please reach out....

Reddit - Artificial Intelligence · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime