[2505.15340] SSR: Speculative Parallel Scaling Reasoning in Test-time
About this article
Abstract page for arXiv paper 2505.15340: SSR: Speculative Parallel Scaling Reasoning in Test-time
Computer Science > Machine Learning arXiv:2505.15340 (cs) [Submitted on 21 May 2025 (v1), last revised 21 Mar 2026 (this version, v2)] Title:SSR: Speculative Parallel Scaling Reasoning in Test-time Authors:Yuanlin Chu, Bo Wang, Xiang Liu, Hong Chen, Aiwei Liu, Xuming Hu View a PDF of the paper titled SSR: Speculative Parallel Scaling Reasoning in Test-time, by Yuanlin Chu and 5 other authors View PDF HTML (experimental) Abstract:Large language models (LLMs) have achieved impressive results on multi-step mathematical reasoning, yet at the cost of high computational overhead. This challenge is particularly acute for test-time scaling methods such as parallel decoding, which increase answer diversity but scale poorly in efficiency. To address this efficiency-accuracy trade-off, we propose SSR (Speculative Parallel Scaling Reasoning), a training-free framework that leverages a key insight: by introducing speculative decoding at the step level, we can accelerate reasoning without sacrificing correctness. SSR integrates two components: a Selective Parallel Module (SPM) that identifies a small set of promising reasoning strategies via model-internal scoring, and Step-level Speculative Decoding (SSD), which enables efficient draft-target collaboration for fine-grained reasoning acceleration. Experiments on three mathematical benchmarks-AIME 2024, MATH-500, and LiveMathBench - demonstrate that SSR achieves strong gains over baselines. For instance, on LiveMathBench, SSR improves pa...