Llms Machine Learning Ai Infrastructure Nlp

[2602.21189] Why Pass@k Optimization Can Degrade Pass@1: Prompt Interference in LLM Post-training

arXiv - Machine Learning February 25, 2026 4 min read Article

Summary

The paper explores the trade-off between Pass@k and Pass@1 performance metrics in large language models, revealing how optimizing for Pass@k can negatively impact Pass@1 due to prompt interference.

Why It Matters

Understanding the relationship between Pass@k and Pass@1 is crucial for optimizing large language models, especially in applications where single-shot accuracy is essential. This research highlights the complexities of model fine-tuning and the potential pitfalls of current optimization strategies.

Key Takeaways

Pass@k optimization can lead to a decrease in Pass@1 performance.
The trade-off is significant for applications requiring reliable single-shot responses.
Prompt interference is a key factor in the observed performance degradation.
The study provides a theoretical framework for understanding these dynamics.
Experiments validate the theoretical findings in the context of mathematical reasoning tasks.

Computer Science > Machine Learning arXiv:2602.21189 (cs) [Submitted on 24 Feb 2026] Title:Why Pass@k Optimization Can Degrade Pass@1: Prompt Interference in LLM Post-training Authors:Anas Barakat, Souradip Chakraborty, Khushbu Pahwa, Amrit Singh Bedi View a PDF of the paper titled Why Pass@k Optimization Can Degrade Pass@1: Prompt Interference in LLM Post-training, by Anas Barakat and 3 other authors View PDF HTML (experimental) Abstract:Pass@k is a widely used performance metric for verifiable large language model tasks, including mathematical reasoning, code generation, and short-answer reasoning. It defines success if any of $k$ independently sampled solutions passes a verifier. This multi-sample inference metric has motivated inference-aware fine-tuning methods that directly optimize pass@$k$. However, prior work reports a recurring trade-off: pass@k improves while pass@1 degrades under such methods. This trade-off is practically important because pass@1 often remains a hard operational constraint due to latency and cost budgets, imperfect verifier coverage, and the need for a reliable single-shot fallback. We study the origin of this trade-off and provide a theoretical characterization of when pass@k policy optimization can reduce pass@1 through gradient conflict induced by prompt interference. We show that pass@$k$ policy gradients can conflict with pass@1 gradients because pass@$k$ optimization implicitly reweights prompts toward low-success prompts; when these pro...

Read Original Article

[2602.21189] Why Pass@k Optimization Can Degrade Pass@1: Prompt Interference in LLM Post-training

Summary

Why It Matters

Key Takeaways

Related Articles

"Oops! ChatGPT is Temporarily Unavailable!": A Diary Study on Knowledge Workers' Experiences of LLM Withdrawal

I built a Star Trek LCARS terminal that reads your entire AI coding setup

[R] Is autoresearch really better than classic hyperparameter tuning?

Claude Source Code?

No comments

Stay updated with AI News