[2603.26425] CPUBone: Efficient Vision Backbone Design for Devices with Low Parallelization Capabilities
About this article
Abstract page for arXiv paper 2603.26425: CPUBone: Efficient Vision Backbone Design for Devices with Low Parallelization Capabilities
Computer Science > Computer Vision and Pattern Recognition arXiv:2603.26425 (cs) [Submitted on 27 Mar 2026] Title:CPUBone: Efficient Vision Backbone Design for Devices with Low Parallelization Capabilities Authors:Moritz Nottebaum, Matteo Dunnhofer, Christian Micheloni View a PDF of the paper titled CPUBone: Efficient Vision Backbone Design for Devices with Low Parallelization Capabilities, by Moritz Nottebaum and 2 other authors View PDF HTML (experimental) Abstract:Recent research on vision backbone architectures has predominantly focused on optimizing efficiency for hardware platforms with high parallel processing capabilities. This category increasingly includes embedded systems such as mobile phones and embedded AI accelerator modules. In contrast, CPUs do not have the possibility to parallelize operations in the same manner, wherefore models benefit from a specific design philosophy that balances amount of operations (MACs) and hardware-efficient execution by having high MACs per second (MACpS). In pursuit of this, we investigate two modifications to standard convolutions, aimed at reducing computational cost: grouping convolutions and reducing kernel sizes. While both adaptations substantially decrease the total number of MACs required for inference, sustaining low latency necessitates preserving hardware-efficiency. Our experiments across diverse CPU devices confirm that these adaptations successfully retain high hardware-efficiency on CPUs. Based on these insights...