[2603.26595] PQuantML: A Tool for End-to-End Hardware-aware Model Compression
About this article
Abstract page for arXiv paper 2603.26595: PQuantML: A Tool for End-to-End Hardware-aware Model Compression
Computer Science > Machine Learning arXiv:2603.26595 (cs) [Submitted on 27 Mar 2026] Title:PQuantML: A Tool for End-to-End Hardware-aware Model Compression Authors:Roope Niemi, Anastasiia Petrovych, Arghya Ranjan Das, Enrico Lupi, Chang Sun, Dimitrios Danopoulos, Marlon Joshua Helbing, Mia Liu, Sebastian Dittmeier, Michael Kagan, Vladimir Loncar, Maurizio Pierini View a PDF of the paper titled PQuantML: A Tool for End-to-End Hardware-aware Model Compression, by Roope Niemi and 11 other authors View PDF HTML (experimental) Abstract:PQuantML is a new open-source, hardware-aware neural network model compression library tailored to end-to-end workflows. Motivated by the need to deploy performant models to environments with strict latency constraints, PQuantML simplifies training of compressed models by providing a unified interface to apply pruning and quantization, either jointly or individually. The library implements multiple pruning methods with different granularities, as well as fixed-point quantization with support for High-Granularity Quantization. We evaluate PQuantML on representative tasks such as the jet substructure classification, so-called jet tagging, an on-edge problem related to real-time LHC data processing. Using various pruning methods with fixed-point quantization, PQuantML achieves substantial parameter and bit-width reductions while maintaining accuracy. The resulting compression is further compared against existing tools, such as QKeras and HGQ. Subjec...