[2512.12850] KANELÉ: Kolmogorov-Arnold Networks for Efficient LUT-based Evaluation
Summary
The paper introduces KANELÉ, a framework utilizing Kolmogorov-Arnold Networks for efficient FPGA-based neural network evaluation, achieving significant speed and resource efficiency.
Why It Matters
KANELÉ addresses the growing need for low-latency, resource-efficient neural network implementations in real-time applications. By leveraging unique properties of KANs, it offers a systematic design flow that enhances performance on FPGAs, which are critical for various applications in AI and hardware architecture.
Key Takeaways
- KANELÉ framework significantly improves FPGA-based inference speed by up to 2700x.
- Utilizes Kolmogorov-Arnold Networks for efficient LUT mapping and resource usage.
- First systematic design flow for KANs, integrating training with quantization and pruning.
- Demonstrates versatility in real-time control systems and symbolic tasks.
- Surpasses traditional LUT-based architectures on standard benchmarks.
Computer Science > Hardware Architecture arXiv:2512.12850 (cs) [Submitted on 14 Dec 2025 (v1), last revised 18 Feb 2026 (this version, v2)] Title:KANELÉ: Kolmogorov-Arnold Networks for Efficient LUT-based Evaluation Authors:Duc Hoang, Aarush Gupta, Philip Harris View a PDF of the paper titled KANEL\'E: Kolmogorov-Arnold Networks for Efficient LUT-based Evaluation, by Duc Hoang and 2 other authors View PDF HTML (experimental) Abstract:Low-latency, resource-efficient neural network inference on FPGAs is essential for applications demanding real-time capability and low power. Lookup table (LUT)-based neural networks are a common solution, combining strong representational power with efficient FPGA implementation. In this work, we introduce KANELÉ, a framework that exploits the unique properties of Kolmogorov-Arnold Networks (KANs) for FPGA deployment. Unlike traditional multilayer perceptrons (MLPs), KANs employ learnable one-dimensional splines with fixed domains as edge activations, a structure naturally suited to discretization and efficient LUT mapping. We present the first systematic design flow for implementing KANs on FPGAs, co-optimizing training with quantization and pruning to enable compact, high-throughput, and low-latency KAN architectures. Our results demonstrate up to a 2700x speedup and orders of magnitude resource savings compared to prior KAN-on-FPGA approaches. Moreover, KANELÉ matches or surpasses other LUT-based architectures on widely used benchmarks, pa...