[2602.15843] The Perplexity Paradox: Why Code Compresses Better Than Math in LLM Prompts
Summary
This article explores the 'perplexity paradox' in large language models (LLMs), demonstrating that code compresses better than mathematical prompts, and introduces a new adaptive compression algorithm.
Why It Matters
Understanding how LLMs process and compress different types of prompts is crucial for optimizing their performance. This research provides insights that could enhance code generation and reasoning tasks, making AI applications more efficient and effective.
Key Takeaways
- Code prompts tolerate higher compression rates than math prompts.
- A new adaptive compression algorithm (TAAC) improves efficiency while preserving quality.
- The study validates findings across multiple benchmarks, enhancing the understanding of LLM capabilities.
Computer Science > Computation and Language arXiv:2602.15843 (cs) [Submitted on 21 Jan 2026] Title:The Perplexity Paradox: Why Code Compresses Better Than Math in LLM Prompts Authors:Warren Johnson View a PDF of the paper titled The Perplexity Paradox: Why Code Compresses Better Than Math in LLM Prompts, by Warren Johnson View PDF HTML (experimental) Abstract:In "Compress or Route?" (Johnson, 2026), we found that code generation tolerates aggressive prompt compression (r >= 0.6) while chain-of-thought reasoning degrades gradually. That study was limited to HumanEval (164 problems), left the "perplexity paradox" mechanism unvalidated, and provided no adaptive algorithm. This paper addresses all three gaps. First, we validate across six code benchmarks (HumanEval, MBPP, HumanEval+, MultiPL-E) and four reasoning benchmarks (GSM8K, MATH, ARC-Challenge, MMLU-STEM), confirming the compression threshold generalizes across languages and difficulties. Second, we conduct the first per-token perplexity analysis (n=723 tokens), revealing a "perplexity paradox": code syntax tokens are preserved (high perplexity) while numerical values in math problems are pruned despite being task-critical (low perplexity). Signature injection recovers +34 percentage points in pass rate (5.3% to 39.3%; Cohen's h=0.890). Third, we propose TAAC (Task-Aware Adaptive Compression), achieving 22% cost reduction with 96% quality preservation, outperforming fixed-ratio compression by 7%. MBPP validation (n=1,8...