[2509.26522] Entropy After $\langle \texttt{/Think} \rangle$ for reasoning model early exiting
Summary
The paper presents a novel method, Entropy After </Think> (EAT), to optimize reasoning in LLMs by reducing unnecessary computations while maintaining accuracy.
Why It Matters
As reasoning models become integral in AI applications, optimizing their efficiency is crucial. EAT addresses the challenge of overthinking in LLMs, which can waste computational resources. This research contributes to the ongoing discourse on improving AI performance while managing resource allocation effectively.
Key Takeaways
- EAT helps detect and prevent overthinking in reasoning models.
- The method can reduce token usage by 12-22% without compromising accuracy.
- EAT is effective even in black box settings where model internals are inaccessible.
- The approach allows for adaptive compute allocation based on reasoning dynamics.
- Empirical results on MATH500 and AIME2025 validate the proposed method.
Computer Science > Machine Learning arXiv:2509.26522 (cs) [Submitted on 30 Sep 2025 (v1), last revised 19 Feb 2026 (this version, v2)] Title:Entropy After $\langle \texttt{/Think} \rangle$ for reasoning model early exiting Authors:Xi Wang, James McInerney, Lequn Wang, Nathan Kallus View a PDF of the paper titled Entropy After $\langle \texttt{/Think} \rangle$ for reasoning model early exiting, by Xi Wang and 3 other authors View PDF Abstract:Reasoning LLMs show improved performance with longer chains of thought. However, recent work has highlighted their tendency to overthink, continuing to revise answers even after reaching the correct solution. We quantitatively confirm this inefficiency from the distribution dynamics perspective by tracking Pass@1 for answers averaged over a large number of rollouts and find the model often begins to always produce the correct answer early in the reasoning, making extra reasoning tokens wasteful. To detect and prevent overthinking, we propose a simple and inexpensive novel signal, Entropy After </Think> (EAT), for monitoring and deciding whether to exit reasoning early. By appending a stop thinking token (</think>) and monitoring the entropy of the following token as the model reasons, we obtain a trajectory that decreases and stabilizes when Pass@1 plateaus; thresholding its variance under an exponential moving average yields a practical stopping rule. Importantly, our approach enables adaptively allocating compute based on the EAT tra...