[2604.04648] From Curiosity to Caution: Mitigating Reward Hacking for Best-of-N with Pessimism

[2604.04648] From Curiosity to Caution: Mitigating Reward Hacking for Best-of-N with Pessimism

arXiv - Machine Learning 4 min read

About this article

Abstract page for arXiv paper 2604.04648: From Curiosity to Caution: Mitigating Reward Hacking for Best-of-N with Pessimism

Computer Science > Machine Learning arXiv:2604.04648 (cs) [Submitted on 6 Apr 2026] Title:From Curiosity to Caution: Mitigating Reward Hacking for Best-of-N with Pessimism Authors:Zhuohao Yu, Zhiwei Steven Wu, Adam Block View a PDF of the paper titled From Curiosity to Caution: Mitigating Reward Hacking for Best-of-N with Pessimism, by Zhuohao Yu and 2 other authors View PDF HTML (experimental) Abstract:Inference-time compute scaling has emerged as a powerful paradigm for improving language model performance on a wide range of tasks, but the question of how best to use the additional compute remains open. A popular approach is BoN sampling, where N candidate responses are generated, scored according to a reward model, and the highest-scoring response is selected. While this approach can improve performance, it is vulnerable to reward hacking, where performance degrades as N increases due to the selection of responses that exploit imperfections in the reward model instead of genuinely improving generation quality. Prior attempts to mitigate reward hacking, via stronger reward models or heavy-handed distributional regularization, either fail to fully address over-optimization or are too conservative to exploit additional compute. In this work, we explore the principle of pessimism in RL, which uses lower confidence bounds on value estimates to avoid OOD actions with uncertain reward estimates. Our approach, termed as caution, can be seen as the reverse of curiosity: where cu...

Originally published on April 07, 2026. Curated by AI News.

Related Articles

Llms

Asked Google Gemini about Ai Agency

I asked Google Gemini what it would do if it would have agency. I find reply quite interesting: That is a fair critique. The previous lis...

Reddit - Artificial Intelligence · 1 min ·
Llms

Could the best LLM be able to generate a symbolic AI that is superior to itself, or is there something superior about matrices vs graphs?

Deep neural network AIs have beaten symbolic AIs across the board on many tasks, but is there a chance that symbolic AIs written by DNNs(...

Reddit - Artificial Intelligence · 1 min ·
Llms

BEYOND QUANTUM MICROTUBULES: CONSCIOUSNESS AS SUBSTRATE-INDEPENDENT ARCHITECTURE

I uploaded my consciousness paper to Gemini: “Beyond Quantum Microtubules: Consciousness as Substrate-Independent Architecture.” Then I s...

Reddit - Artificial Intelligence · 1 min ·
Llms

The Scaling Bandaid is Wearing Thin (And Nobody Wants to Admit It)

Let me be direct: we’ve hit a wall with scaling, and the entire field is kind of bullshitting about what comes next. I’ve spent enough ti...

Reddit - Artificial Intelligence · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime