[2602.18758] UFO: Unlocking Ultra-Efficient Quantized Private Inference with Protocol and Algorithm Co-Optimization
Summary
The paper presents UFO, a quantized two-party computation framework that optimizes private CNN inference by combining efficient protocols and quantization algorithms, achieving significant communication reduction and improved accuracy.
Why It Matters
As privacy concerns grow in AI applications, efficient and secure inference methods are crucial. UFO addresses the high communication and latency challenges in private CNN inference, making it a significant advancement in cryptographic AI solutions.
Key Takeaways
- UFO optimizes private CNN inference by co-optimizing protocols and quantization algorithms.
- The framework achieves up to 11.7x communication reduction while maintaining or improving accuracy.
- Novel graph-level optimizations and a mixed-precision QAT algorithm enhance model performance under communication constraints.
Computer Science > Cryptography and Security arXiv:2602.18758 (cs) [Submitted on 21 Feb 2026] Title:UFO: Unlocking Ultra-Efficient Quantized Private Inference with Protocol and Algorithm Co-Optimization Authors:Wenxuan Zeng, Chao Yang, Tianshi Xu, Bo Zhang, Changrui Ren, Jin Dong, Meng Li View a PDF of the paper titled UFO: Unlocking Ultra-Efficient Quantized Private Inference with Protocol and Algorithm Co-Optimization, by Wenxuan Zeng and 6 other authors View PDF HTML (experimental) Abstract:Private convolutional neural network (CNN) inference based on secure two-party computation (2PC) suffers from high communication and latency overhead, especially from convolution layers. In this paper, we propose UFO, a quantized 2PC inference framework that jointly optimizes the 2PC protocols and quantization algorithm. UFO features a novel 2PC protocol that systematically combines the efficient Winograd convolution algorithm with quantization to improve inference efficiency. However, we observe that naively combining quantization and Winograd convolution faces the following challenges: 1) From the inference perspective, Winograd transformations introduce extensive additions and require frequent bit width conversions to avoid inference overflow, leading to non-negligible communication overhead; 2) From the training perspective, Winograd transformations introduce weight outliers that make quantization-aware training (QAT) difficult, resulting in inferior model accuracy. To address th...