[2602.08324] Towards Efficient Large Language Reasoning Models via Extreme-Ratio Chain-of-Thought Compression
About this article
Abstract page for arXiv paper 2602.08324: Towards Efficient Large Language Reasoning Models via Extreme-Ratio Chain-of-Thought Compression
Computer Science > Machine Learning arXiv:2602.08324 (cs) [Submitted on 9 Feb 2026 (v1), last revised 2 Mar 2026 (this version, v2)] Title:Towards Efficient Large Language Reasoning Models via Extreme-Ratio Chain-of-Thought Compression Authors:Yuntian Tang, Bohan Jia, Wenxuan Huang, Lianyue Zhang, Jiao Xie, Wenxi Li, Rongrong Ji, Shaohui Lin View a PDF of the paper titled Towards Efficient Large Language Reasoning Models via Extreme-Ratio Chain-of-Thought Compression, by Yuntian Tang and 7 other authors View PDF HTML (experimental) Abstract:Chain-of-Thought (CoT) reasoning successfully enhances the reasoning capabilities of Large Language Models (LLMs), yet it incurs substantial computational overhead for inference. Existing CoT compression methods often suffer from a critical loss of logical fidelity at high compression ratios, resulting in significant performance degradation. To achieve high-fidelity, fast reasoning, we propose a novel EXTreme-RAtio Chain-of-Thought Compression framework, termed Extra-CoT, which aggressively reduces the token budget while preserving answer accuracy. To generate reliable, high-fidelity supervision, we first train a dedicated semantically-preserved compressor on mathematical CoT data with fine-grained annotations. An LLM is then fine-tuned on these compressed pairs via a mixed-ratio supervised fine-tuning (SFT), teaching it to follow a spectrum of compression budgets and providing a stable initialization for reinforcement learning (RL). We...