[2603.04308] Activation Outliers in Transformer Quantization: Reproduction, Statistical Analysis, and Deployment Tradeoffs
About this article
Abstract page for arXiv paper 2603.04308: Activation Outliers in Transformer Quantization: Reproduction, Statistical Analysis, and Deployment Tradeoffs
Computer Science > Machine Learning arXiv:2603.04308 (cs) [Submitted on 4 Mar 2026] Title:Activation Outliers in Transformer Quantization: Reproduction, Statistical Analysis, and Deployment Tradeoffs Authors:Pranav Kumar Kaliaperumal View a PDF of the paper titled Activation Outliers in Transformer Quantization: Reproduction, Statistical Analysis, and Deployment Tradeoffs, by Pranav Kumar Kaliaperumal View PDF HTML (experimental) Abstract:Post-training quantization (PTQ) of transformers is known to suffer from severe accuracy degradation due to structured activation outliers, as originally analyzed by Bondarenko et al. (EMNLP 2021) in work associated with Qualcomm AI Research. This paper provides a reproducible empirical reproduction and systems-level extension of that phenomenon in BERT-base fine-tuned on QNLI. When global W8A8 quantization is applied, validation accuracy drops sharply from 89.66% (FP32) to 54.33%, a decrease of 35.33 points. Statistical analysis of FP32 activations shows strongly heavy-tailed behavior that intensifies with model depth: kurtosis reaches 271 in the final layers and approximately 55% of activation energy is concentrated in the top 1% of channels. We evaluate several mitigation strategies. Mixed precision PTQ restores accuracy close to the FP32 baseline (89.42%). Per-embedding-group (PEG) quantization shows strong sensitivity to grouping structure, improving accuracy from 66.12% with three groups to 86.18% with four groups. In contrast, perc...