[2512.15742] SHARe-KAN: Post-Training Vector Quantization for Cache-Resident KAN Inference
About this article
Abstract page for arXiv paper 2512.15742: SHARe-KAN: Post-Training Vector Quantization for Cache-Resident KAN Inference
Computer Science > Machine Learning arXiv:2512.15742 (cs) [Submitted on 10 Dec 2025 (v1), last revised 14 Apr 2026 (this version, v2)] Title:SHARe-KAN: Post-Training Vector Quantization for Cache-Resident KAN Inference Authors:Jeff Smith View a PDF of the paper titled SHARe-KAN: Post-Training Vector Quantization for Cache-Resident KAN Inference, by Jeff Smith View PDF HTML (experimental) Abstract:Pre-trained Vision Kolmogorov-Arnold Networks (KANs) store a dense B-spline grid on every edge, inflating prediction-head parameter counts by more than 140X relative to a comparable MLP and pushing inference into a memory-bound regime on edge accelerators. Standard magnitude pruning fails on these pre-trained models: zero-shot sparsity collapses accuracy, and restoring it requires an iterative fine-tuning loop that is impractical in deployment settings. We present SHARe-KAN, a post-training compiler that compresses spline coefficients via a Gain-Shape-Bias decomposition with a layer-shared codebook, paired with LUTHAM, an ExecuTorch runtime that maps the codebook into on-chip L2. On PASCAL VOC detection with a ResNet-50 backbone, SHARe-KAN Int8 reaches 9.3X storage compression over the Dense KAN baseline (6.32 MB vs. 58.67 MB prediction head) at a 2.0 point in-domain accuracy cost (80.22% vs. 82.22% mAP), with no retraining. Zero-shot transfer to COCO retains 88.9% of the Dense KAN mAP; most of this gap comes from the VQ clustering step itself, and further quantization from FP32 t...