[2510.05825] Mitigating Premature Exploitation in Particle-based Monte Carlo for Inference-Time Scaling
About this article
Abstract page for arXiv paper 2510.05825: Mitigating Premature Exploitation in Particle-based Monte Carlo for Inference-Time Scaling
Computer Science > Machine Learning arXiv:2510.05825 (cs) [Submitted on 7 Oct 2025 (v1), last revised 27 Mar 2026 (this version, v2)] Title:Mitigating Premature Exploitation in Particle-based Monte Carlo for Inference-Time Scaling Authors:Giorgio Giannone, Guangxuan Xu, Nikhil Shivakumar Nayak, Rohan Mahesh Awhad, Shivchander Sudalairaj, Kai Xu, Akash Srivastava View a PDF of the paper titled Mitigating Premature Exploitation in Particle-based Monte Carlo for Inference-Time Scaling, by Giorgio Giannone and 6 other authors View PDF Abstract:Inference-Time Scaling (ITS) improves language models by allocating more computation at generation time. Particle Filtering (PF) has emerged as a strong ITS method for complex mathematical reasoning tasks, but it is vulnerable when guided by process reward models, which often assign overconfident scores early in the reasoning process. This causes PF to suffer from premature exploitation: it myopically commits to locally promising trajectories, prunes potentially correct hypotheses, and converges to suboptimal solutions. This failure mode, known as particle impoverishment, is especially severe under constrained computational budgets. To address this, we analyze the problem and identify two root causes: a lack of diversity in the particle set due to overconfident resampling and consequent inability to assess the potential of a reasoning path. We introduce Entropic Particle Filtering (ePF), an algorithm that integrates two new techniques to...