[2602.10273] Power-SMC: Low-Latency Sequence-Level Power Sampling for Training-Free LLM Reasoning
About this article
Abstract page for arXiv paper 2602.10273: Power-SMC: Low-Latency Sequence-Level Power Sampling for Training-Free LLM Reasoning
Statistics > Machine Learning arXiv:2602.10273 (stat) [Submitted on 10 Feb 2026 (v1), last revised 23 Mar 2026 (this version, v2)] Title:Power-SMC: Low-Latency Sequence-Level Power Sampling for Training-Free LLM Reasoning Authors:Seyedarmin Azizi, Erfan Baghaei Potraghloo, Minoo Ahmadi, Souvik Kundu, Massoud Pedram View a PDF of the paper titled Power-SMC: Low-Latency Sequence-Level Power Sampling for Training-Free LLM Reasoning, by Seyedarmin Azizi and 4 other authors View PDF HTML (experimental) Abstract:Many recent reasoning gains in large language models can be explained as distribution sharpening: biasing generation toward high-likelihood trajectories already supported by the pretrained model, rather than modifying its weights. A natural formalization is the sequence-level power distribution $\pi_\alpha(y\mid x)\propto p_\theta(y\mid x)^\alpha$ ($\alpha>1$), which concentrates mass on whole sequences instead of adjusting token-level temperature. Prior work shows that Metropolis--Hastings (MH) sampling from this distribution recovers strong reasoning performance, but at order-of-magnitude inference slowdowns. We introduce Power-SMC, a training-free Sequential Monte Carlo scheme that targets the same objective while remaining close to standard decoding latency. Power-SMC advances a small particle set in parallel, corrects importance weights token-by-token, and resamples when necessary, all within a single GPU-friendly batched decode. We prove that temperature $\tau=1/\a...