[2602.12846] Amortized Reasoning Tree Search: Decoupling Proposal and Decision in Large Language Models
Summary
The paper presents Amortized Reasoning Tree Search (ARTS), a novel approach to enhance reasoning in Large Language Models by decoupling proposal and decision-making processes, addressing limitations of traditional reinforcement learning methods.
Why It Matters
This research is significant as it tackles the challenge of suppressing valid reasoning paths in Large Language Models, which can lead to improved performance in complex reasoning tasks. By introducing ARTS, the authors provide a method that maintains model diversity while enhancing reasoning capabilities, which is crucial for advancing AI applications in various fields.
Key Takeaways
- ARTS decouples proposal and decision-making to improve reasoning in LLMs.
- The approach addresses the 'Normalization Squeeze' issue in reinforcement learning.
- ARTS achieves competitive performance on benchmarks without altering the generative model.
- The method shows significant recovery in performance on long-tail reasoning tasks.
- Flow Matching objective enhances navigation through complex search spaces.
Computer Science > Machine Learning arXiv:2602.12846 (cs) [Submitted on 13 Feb 2026] Title:Amortized Reasoning Tree Search: Decoupling Proposal and Decision in Large Language Models Authors:Zesheng Hong, Jiadong Yu, Hui Pan View a PDF of the paper titled Amortized Reasoning Tree Search: Decoupling Proposal and Decision in Large Language Models, by Zesheng Hong and 2 other authors View PDF HTML (experimental) Abstract:Reinforcement Learning with Verifiable Rewards (RLVR) has established itself as the dominant paradigm for instilling rigorous reasoning capabilities in Large Language Models. While effective at amplifying dominant behaviors, we identify a critical pathology in this alignment process: the systematic suppression of valid but rare (low-likelihood under the base model distribution) reasoning paths. We theoretically characterize this phenomenon as a "Normalization Squeeze," where the interplay between mode-seeking policy gradients and finite sampling acts as a high-pass likelihood filter, driving the probability of rare correct traces to statistical extinction. To counteract this collapse without discarding the base model's latent diversity, we propose Amortized Reasoning Tree Search (ARTS). Unlike standard approaches that force internalization via parameter updates, ARTS prioritizes deliberation by decoupling generation from verification. We introduce a Flow Matching objective that repurposes the verifier to estimate the conservation of probability flow, enabling ...