[2602.08354] Does Your Reasoning Model Implicitly Know When to Stop Thinking?
Summary
This article explores how large reasoning models (LRMs) can implicitly determine when to stop processing information, introducing a new sampling paradigm called SAGE to enhance reasoning efficiency and accuracy.
Why It Matters
Understanding the implicit stopping capability of LRMs is crucial for improving their efficiency in real-time applications. The SAGE paradigm offers a solution to reduce redundancy in reasoning chains, potentially transforming how AI models are developed and deployed in practical scenarios.
Key Takeaways
- LRMs often exhibit redundancy in reasoning, impacting efficiency.
- Longer reasoning chains do not correlate with higher accuracy.
- SAGE introduces a novel sampling method to optimize reasoning processes.
- Integrating SAGE with reinforcement learning enhances performance on mathematical tasks.
- The findings could influence future AI model architectures and applications.
Computer Science > Artificial Intelligence arXiv:2602.08354 (cs) [Submitted on 9 Feb 2026 (v1), last revised 23 Feb 2026 (this version, v2)] Title:Does Your Reasoning Model Implicitly Know When to Stop Thinking? Authors:Zixuan Huang, Xin Xia, Yuxi Ren, Jianbin Zheng, Xuanda Wang, Zhixia Zhang, Hongyan Xie, Songshi Liang, Zehao Chen, Xuefeng Xiao, Fuzhen Zhuang, Jianxin Li, Yikun Ban, Deqing Wang View a PDF of the paper titled Does Your Reasoning Model Implicitly Know When to Stop Thinking?, by Zixuan Huang and 12 other authors View PDF HTML (experimental) Abstract:Recent advancements in large reasoning models (LRMs) have greatly improved their capabilities on complex reasoning tasks through Long Chains of Thought (CoTs). However, this approach often results in substantial redundancy, impairing computational efficiency and causing significant delays in real-time applications. Recent studies show that longer reasoning chains are frequently uncorrelated with correctness and can even be detrimental to accuracy. In a further in-depth analysis of this phenomenon, we surprisingly uncover and empirically verify that LRMs implicitly know the appropriate time to stop thinking, while this capability is obscured by current sampling paradigms. Motivated by this, we introduce SAGE (Self-Aware Guided Efficient Reasoning), a novel sampling paradigm that unleashes this efficient reasoning potential. Furthermore, integrating SAGE as mixed sampling into group-based reinforcement learning (SA...