[2505.16122] Plan and Budget: Effective and Efficient Test-Time Scaling on Reasoning Large Language Models
About this article
Abstract page for arXiv paper 2505.16122: Plan and Budget: Effective and Efficient Test-Time Scaling on Reasoning Large Language Models
Computer Science > Machine Learning arXiv:2505.16122 (cs) [Submitted on 22 May 2025 (v1), last revised 2 Mar 2026 (this version, v3)] Title:Plan and Budget: Effective and Efficient Test-Time Scaling on Reasoning Large Language Models Authors:Junhong Lin, Xinyue Zeng, Jie Zhu, Song Wang, Julian Shun, Jun Wu, Dawei Zhou View a PDF of the paper titled Plan and Budget: Effective and Efficient Test-Time Scaling on Reasoning Large Language Models, by Junhong Lin and 6 other authors View PDF Abstract:Large Language Models (LLMs) have achieved remarkable success in complex reasoning tasks, but their inference remains computationally inefficient. We observe a common failure mode in many prevalent LLMs, overthinking, where models generate verbose and tangential reasoning traces even for simple queries. Recent work has tried to mitigate this by enforcing fixed token budgets, however, this can lead to underthinking, especially on harder problems. Through empirical analysis, we identify that this inefficiency often stems from unclear problem-solving strategies. To formalize this, we develop a theoretical model, BAM (Budget Allocation Model), which models reasoning as a sequence of sub-questions with varying uncertainty, and introduce the E3 metric to capture the trade-off between correctness and computation efficiency. Building on theoretical results from BAM, we propose Plan-and-Budget, a model-agnostic, test-time framework that decomposes complex queries into sub-questions and alloca...