[2508.03346] Making Slow Thinking Faster: Compressing LLM Chain-of-Thought via Step Entropy

[2508.03346] Making Slow Thinking Faster: Compressing LLM Chain-of-Thought via Step Entropy

arXiv - AI 4 min read Article

Summary

This article presents a novel framework for compressing Chain-of-Thought (CoT) prompts in Large Language Models (LLMs) to enhance inference efficiency while maintaining accuracy.

Why It Matters

As LLMs become integral to various applications, optimizing their reasoning processes is crucial for improving performance and reducing computational costs. This research addresses redundancy in CoT prompting, which can significantly impact efficiency and scalability in real-world applications.

Key Takeaways

  • Introduces a CoT compression framework based on step entropy to reduce redundancy.
  • Demonstrates that 80% of low-entropy steps can be pruned with minimal accuracy loss.
  • Proposes a two-stage training strategy combining Supervised Fine-Tuning and reinforcement learning for improved efficiency.

Computer Science > Artificial Intelligence arXiv:2508.03346 (cs) [Submitted on 5 Aug 2025 (v1), last revised 16 Feb 2026 (this version, v2)] Title:Making Slow Thinking Faster: Compressing LLM Chain-of-Thought via Step Entropy Authors:Zeju Li, Jianyuan Zhong, Ziyang Zheng, Xiangyu Wen, Zhijian Xu, Yingying Cheng, Fan Zhang, Qiang Xu View a PDF of the paper titled Making Slow Thinking Faster: Compressing LLM Chain-of-Thought via Step Entropy, by Zeju Li and 7 other authors View PDF HTML (experimental) Abstract:Large Language Models (LLMs) using Chain-of-Thought (CoT) prompting excel at complex reasoning but generate verbose thought processes with considerable redundancy, leading to increased inference costs and reduced efficiency. We introduce a novel CoT compression framework based on step entropy, a metric that quantifies \emph{the informational contribution of individual reasoning steps} to identify redundancy. Through theoretical analysis and extensive empirical validation on mathematical reasoning benchmarks, we demonstrate that steps with low entropy are indeed highly redundant. Our experiments reveal that an astonishing 80\% of low-entropy intermediate steps can be pruned with minor degradation in the final answer accuracy across DeepSeek-R1-7B, 14B and Qwen3-8B. This finding sharply contrasts with random or high-entropy pruning, which severely impairs reasoning performance. Building on this, we propose a novel two-stage training strategy combining Supervised Fine-Tun...

Related Articles

Google Maps can now write captions for your photos using AI | TechCrunch
Llms

Google Maps can now write captions for your photos using AI | TechCrunch

Gemini can now create captions when users are looking to share a photo or video.

TechCrunch - AI · 4 min ·
Llms

ParetoBandit: Budget-Paced Adaptive Routing for Non-Stationary LLM Serving

submitted by /u/PatienceHistorical70 [link] [comments]

Reddit - Machine Learning · 1 min ·
Llms

Stop Overcomplicating AI Workflows. This Is the Simple Framework

I’ve been working on building an agentic AI workflow system for business use cases and one thing became very clear very quickly. This is ...

Reddit - Artificial Intelligence · 1 min ·
Llms

Lemonade 10.1 released for latest improvements for local LLMs on AMD GPUs & NPUs

submitted by /u/Fcking_Chuck [link] [comments]

Reddit - Artificial Intelligence · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime