[2603.29993] Extending MONA in Camera Dropbox: Reproduction, Learned

[2603.29993] Extending MONA in Camera Dropbox: Reproduction, Learned Approval, and Design Implications for Reward-Hacking Mitigation

arXiv - AI April 01, 2026 4 min read

About this article

Abstract page for arXiv paper 2603.29993: Extending MONA in Camera Dropbox: Reproduction, Learned Approval, and Design Implications for Reward-Hacking Mitigation

Computer Science > Artificial Intelligence arXiv:2603.29993 (cs) [Submitted on 31 Mar 2026] Title:Extending MONA in Camera Dropbox: Reproduction, Learned Approval, and Design Implications for Reward-Hacking Mitigation Authors:Nathan Heath View a PDF of the paper titled Extending MONA in Camera Dropbox: Reproduction, Learned Approval, and Design Implications for Reward-Hacking Mitigation, by Nathan Heath View PDF HTML (experimental) Abstract:Myopic Optimization with Non-myopic Approval (MONA) mitigates multi-step reward hacking by restricting the agent's planning horizon while supplying far-sighted approval as a training signal~\cite{farquhar2025mona}. The original paper identifies a critical open question: how the method of constructing approval -- particularly the degree to which approval depends on achieved outcomes -- affects whether MONA's safety guarantees hold. We present a reproduction-first extension of the public MONA Camera Dropbox environment that (i)~repackages the released codebase as a standard Python project with scripted PPO training, (ii)~confirms the published contrast between ordinary RL (91.5\% reward-hacking rate) and oracle MONA (0.0\% hacking rate) using the released reference arrays, and (iii)~introduces a modular learned-approval suite spanning oracle, noisy, misspecified, learned, and calibrated approval mechanisms. In reduced-budget pilot sweeps across approval methods, horizons, dataset sizes, and calibration strategies, the best calibrated lear...

Originally published on April 01, 2026. Curated by AI News.

Llms

Can Claude Opus 4.7 and Ensemble AI Models Finally Make Code Review Reliable?

Ensemble AI models like Claude Opus 4.7 transform code review reliability. Discover how multi-model approaches catch subtle bugs human re...

AI Tools & Products · 9 min · about 2 hours ago

Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min · about 3 hours ago

Machine Learning

Improving AI models’ ability to explain their predictions

AI News - General · 9 min · about 3 hours ago

Machine Learning

New technique makes AI models leaner and faster while they’re still learning