[2411.17501] The Limits of Inference Scaling Through Resampling
About this article
Abstract page for arXiv paper 2411.17501: The Limits of Inference Scaling Through Resampling
Computer Science > Machine Learning arXiv:2411.17501 (cs) [Submitted on 26 Nov 2024 (v1), last revised 26 Mar 2026 (this version, v3)] Title:The Limits of Inference Scaling Through Resampling Authors:Benedikt Stroebl, Sayash Kapoor, Arvind Narayanan View a PDF of the paper titled The Limits of Inference Scaling Through Resampling, by Benedikt Stroebl and 2 other authors View PDF HTML (experimental) Abstract:Recent research has generated hope that inference scaling, such as resampling solutions until they pass verifiers like unit tests, could allow weaker models to match stronger ones. Beyond inference, this approach also enables training reasoning models, where data is curated using rejection sampling against a verifier. However, we show that this approach is fundamentally limited when verifiers are imperfect and have a non-zero probability of producing false positives. Resampling cannot decrease this probability, so it imposes an upper bound to the accuracy of resampling-based inference scaling, regardless of compute budget. Our analysis shows that there is a strong correlation between the model's single-sample accuracy and its false positive rate on HumanEval and MBPP, whose unit tests have limited coverage. Therefore, no amount of inference scaling of weaker models can enable them to match the single-sample accuracy of a sufficiently strong model. Empirical results show that optimal sampling attempts are often fewer than 10, as the negative utility of false positives ou...