[2512.01925] Rectifying LLM Thought from Lens of Optimization
About this article
Abstract page for arXiv paper 2512.01925: Rectifying LLM Thought from Lens of Optimization
Computer Science > Computation and Language arXiv:2512.01925 (cs) [Submitted on 1 Dec 2025 (v1), last revised 7 Apr 2026 (this version, v2)] Title:Rectifying LLM Thought from Lens of Optimization Authors:Junnan Liu, Hongwei Liu, Songyang Zhang, Kai Chen View a PDF of the paper titled Rectifying LLM Thought from Lens of Optimization, by Junnan Liu and 3 other authors View PDF HTML (experimental) Abstract:Recent advancements in large language models (LLMs) have been driven by their emergent reasoning capabilities, particularly through long chain-of-thought (CoT) prompting, which enables thorough exploration and deliberation. Despite these advances, long-CoT LLMs often exhibit suboptimal reasoning behaviors, such as overthinking and excessively protracted reasoning chains, which can impair performance. In this paper, we analyze reasoning processes through an optimization lens, framing CoT as a gradient descent procedure where each reasoning step constitutes an update toward problem resolution. Building on this perspective, we introduce RePro (Rectifying Process-level Reward), a novel approach to refine LLM reasoning during post-training. RePro defines a surrogate objective function to assess the optimization process underlying CoT, utilizing a dual scoring mechanism to quantify its intensity and stability. These scores are aggregated into a composite process-level reward, seamlessly integrated into reinforcement learning with verifiable rewards (RLVR) pipelines to optimize LL...