[2602.23334] Bitwise Systolic Array Architecture for Runtime-Reconfigurable Multi-precision Quantized Multiplication on Hardware Accelerators
Summary
This paper presents a novel bitwise systolic array architecture designed for runtime-reconfigurable multi-precision quantized multiplication, enhancing performance in neural network accelerators.
Why It Matters
As AI applications increasingly rely on edge devices, optimizing hardware for mixed-precision quantization is crucial. This architecture addresses the challenge of balancing resource consumption and accuracy, making it relevant for developers and researchers in hardware design and AI.
Key Takeaways
- Proposes a runtime reconfigurable architecture for multi-precision quantized multiplication.
- Achieves significant speedup (1.3185 to 3.5671 times) in inference for mixed-precision models.
- Demonstrates lower critical path delay and supports higher clock frequencies (250MHz).
- Addresses the limitations of traditional hardware designs in supporting precision reconfiguration.
- Evaluated on the Ultra96 FPGA platform, showcasing practical application.
Computer Science > Hardware Architecture arXiv:2602.23334 (cs) [Submitted on 26 Feb 2026] Title:Bitwise Systolic Array Architecture for Runtime-Reconfigurable Multi-precision Quantized Multiplication on Hardware Accelerators Authors:Yuhao Liu, Salim Ullah, Akash Kumar View a PDF of the paper titled Bitwise Systolic Array Architecture for Runtime-Reconfigurable Multi-precision Quantized Multiplication on Hardware Accelerators, by Yuhao Liu and 2 other authors View PDF Abstract:Neural network accelerators have been widely applied to edge devices for complex tasks like object tracking, image recognition, etc. Previous works have explored the quantization technologies in related lightweight accelerator designs to reduce hardware resource consumption. However, low precision leads to high accuracy loss in inference. Therefore, mixed-precision quantization becomes an alternative solution by applying different precision in different layers to trade off resource consumption and accuracy. Because regular designs for multiplication on hardware cannot support the precision reconfiguration for a multi-precision Quantized Neural Network (QNN) model in runtime, we propose a runtime reconfigurable multi-precision multi-channel bitwise systolic array design for QNN accelerators. We have implemented and evaluated our work on the Ultra96 FPGA platform. Results show that our work can achieve 1.3185 to 3.5671 times speedup in inferring mixed-precision models and has less critical path delay, s...