[2602.19268] CORVET: A CORDIC-Powered, Resource-Frugal Mixed-Precision Vector Processing Engine for High-Throughput AIoT applications
Summary
The article presents CORVET, a resource-efficient vector processing engine utilizing CORDIC for high-throughput AIoT applications, achieving significant performance improvements with mixed-precision capabilities.
Why It Matters
As AIoT applications grow, the demand for efficient processing solutions increases. CORVET's innovative design addresses this need by balancing performance and resource consumption, making it relevant for developers and researchers focused on edge AI technologies.
Key Takeaways
- CORVET employs a CORDIC-based MAC unit for enhanced performance.
- The design allows dynamic reconfiguration between approximate and accurate processing modes.
- Achieves up to 4x throughput improvement while conserving hardware resources.
- Demonstrates significant energy efficiency with a compute density of 4.83 TOPS/mm2.
- Includes a hardware-software co-design methodology for practical applications.
Computer Science > Hardware Architecture arXiv:2602.19268 (cs) [Submitted on 22 Feb 2026] Title:CORVET: A CORDIC-Powered, Resource-Frugal Mixed-Precision Vector Processing Engine for High-Throughput AIoT applications Authors:Sonu Kumar, Mohd Faisal Khan, Mukul Lokhande, Santosh Kumar Vishvakarma View a PDF of the paper titled CORVET: A CORDIC-Powered, Resource-Frugal Mixed-Precision Vector Processing Engine for High-Throughput AIoT applications, by Sonu Kumar and 3 other authors View PDF HTML (experimental) Abstract:This brief presents a runtime-adaptive, performance-enhanced vector engine featuring a low-resource, iterative CORDIC-based MAC unit for edge AI acceleration. The proposed design enables dynamic reconfiguration between approximate and accurate modes, exploiting the latency-accuracy trade-off for a wide range of workloads. Its resource-efficient approach further enables up to 4x throughput improvement within the same hardware resources by leveraging vectorised, time-multiplexed execution and flexible precision scaling. With a time-multiplexed multi-AF block and a lightweight pooling and normalisation unit, the proposed vector engine supports flexible precision (4/8/16-bit) and high MAC density. The ASIC implementation results show that each MAC stage can save up to 33% of time and 21% of power, with a 256-PE configuration that achieves higher compute density (4.83 TOPS/mm2 ) and energy efficiency (11.67 TOPS/W) than previous state-of-the-art work. A detailed har...