[2504.02010] When Reasoning Meets Compression: Understanding the Effects of LLMs Compression on Large Reasoning Models
About this article
Abstract page for arXiv paper 2504.02010: When Reasoning Meets Compression: Understanding the Effects of LLMs Compression on Large Reasoning Models
Computer Science > Machine Learning arXiv:2504.02010 (cs) [Submitted on 2 Apr 2025 (v1), last revised 2 Mar 2026 (this version, v3)] Title:When Reasoning Meets Compression: Understanding the Effects of LLMs Compression on Large Reasoning Models Authors:Nan Zhang, Eugene Kwek, Yusen Zhang, Ngoc-Hieu Nguyen, Prasenjit Mitra, Rui Zhang View a PDF of the paper titled When Reasoning Meets Compression: Understanding the Effects of LLMs Compression on Large Reasoning Models, by Nan Zhang and 5 other authors View PDF HTML (experimental) Abstract:Compression methods, including quantization, distillation, and pruning, improve the computational efficiency of large reasoning models (LRMs). However, existing studies either fail to sufficiently compare all three compression methods on LRMs or lack in-depth interpretation analysis. In this paper, we investigate how the reasoning capabilities of LRMs are compromised during compression, through performance benchmarking and mechanistic interpretation. To uncover the effects of compression on reasoning performance, we benchmark quantized, distilled, and pruned DeepSeek-R1 models on four reasoning datasets (AIME 2024, FOLIO, Temporal Sequences, and MuSiQue). To precisely locate compression effects on model weights, we adapt difference of means and attribution patching techniques, focusing on the activation of every linear component in compressed LRMs, to interpret fine-grained causal relationships between weights and various reasoning capabil...