[2512.01925] Rectifying LLM Thought from Lens of Optimization

[2512.01925] Rectifying LLM Thought from Lens of Optimization

arXiv - AI 3 min read

About this article

Abstract page for arXiv paper 2512.01925: Rectifying LLM Thought from Lens of Optimization

Computer Science > Computation and Language arXiv:2512.01925 (cs) [Submitted on 1 Dec 2025 (v1), last revised 7 Apr 2026 (this version, v2)] Title:Rectifying LLM Thought from Lens of Optimization Authors:Junnan Liu, Hongwei Liu, Songyang Zhang, Kai Chen View a PDF of the paper titled Rectifying LLM Thought from Lens of Optimization, by Junnan Liu and 3 other authors View PDF HTML (experimental) Abstract:Recent advancements in large language models (LLMs) have been driven by their emergent reasoning capabilities, particularly through long chain-of-thought (CoT) prompting, which enables thorough exploration and deliberation. Despite these advances, long-CoT LLMs often exhibit suboptimal reasoning behaviors, such as overthinking and excessively protracted reasoning chains, which can impair performance. In this paper, we analyze reasoning processes through an optimization lens, framing CoT as a gradient descent procedure where each reasoning step constitutes an update toward problem resolution. Building on this perspective, we introduce RePro (Rectifying Process-level Reward), a novel approach to refine LLM reasoning during post-training. RePro defines a surrogate objective function to assess the optimization process underlying CoT, utilizing a dual scoring mechanism to quantify its intensity and stability. These scores are aggregated into a composite process-level reward, seamlessly integrated into reinforcement learning with verifiable rewards (RLVR) pipelines to optimize LL...

Originally published on April 09, 2026. Curated by AI News.

Related Articles

Llms

Diffusion for generating/editing ASTs? [D]

I’m not a machine learning expert or anything, but I do enjoy learning about how it all works. I’ve noticed that one of the main limitati...

Reddit - Machine Learning · 1 min ·
ChatGPT’s ‘Trusted Contact’ will alert loved ones of safety concerns | The Verge
Llms

ChatGPT’s ‘Trusted Contact’ will alert loved ones of safety concerns | The Verge

OpenAI is launching an optional safety feature for ChatGPT that allows adult users to assign an emergency contact for mental health and s...

The Verge - AI · 4 min ·
Llms

AI is helpful but still not “there” yet

what I mean is that every time I use Claude, or Grok or any of the AI platforms and tools, I realize how far this technology is from repl...

Reddit - Artificial Intelligence · 1 min ·
ChatGPT Has 'Goblin' Mania in the US. In China It Will 'Catch You Steadily' | WIRED
Llms

ChatGPT Has 'Goblin' Mania in the US. In China It Will 'Catch You Steadily' | WIRED

OpenAI's chatbot has some weird linguistic tics in Chinese that are driving users crazy.

Wired - AI · 8 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime