[2602.18116] Cut Less, Fold More: Model Compression through the Lens of Projection Geometry
Summary
This paper explores model compression techniques for neural networks, focusing on projection geometry to improve accuracy and efficiency without retraining.
Why It Matters
As neural networks grow in complexity, efficient deployment becomes crucial. This research presents a novel approach to model compression that can enhance performance while reducing resource requirements, making it relevant for developers and researchers in machine learning and AI.
Key Takeaways
- Model folding outperforms structured pruning in compression accuracy.
- The study formalizes compression techniques as orthogonal operators.
- Folding achieves better results under various training conditions.
- Calibration-free methods are essential for scalable neural network deployment.
- The research evaluates over 1000 checkpoints, providing robust empirical evidence.
Computer Science > Machine Learning arXiv:2602.18116 (cs) [Submitted on 20 Feb 2026] Title:Cut Less, Fold More: Model Compression through the Lens of Projection Geometry Authors:Olga Saukh, Dong Wang, Haris Šikić, Yun Cheng, Lothar Thiele View a PDF of the paper titled Cut Less, Fold More: Model Compression through the Lens of Projection Geometry, by Olga Saukh and 4 other authors View PDF HTML (experimental) Abstract:Compressing neural networks without retraining is vital for deployment at scale. We study calibration-free compression through the lens of projection geometry: structured pruning is an axis-aligned projection, whereas model folding performs a low-rank projection via weight clustering. We formalize both as orthogonal operators and show that, within a rank distance of one, folding provably yields smaller parameter reconstruction error, and under mild smoothness assumptions, smaller functional perturbations than pruning. At scale, we evaluate >1000 checkpoints spanning ResNet18, PreActResNet18, ViT-B/32, and CLIP ViT-B/32 on CIFAR-10 and ImageNet-1K, covering diverse training hyperparameters (optimizers, learning rates, augmentations, regularization, sharpness-aware training), as well as multiple LLaMA-family 60M and 130M parameter models trained on C4. We show that folding typically achieves higher post-compression accuracy, with the largest gains at moderate-high compression. The gap narrows and occasionally reverses at specific training setups. Our results po...