[2602.22777] KMLP: A Scalable Hybrid Architecture for Web-Scale Tabular Data Modeling
Summary
The paper introduces KMLP, a hybrid architecture designed for scalable predictive modeling on web-scale tabular data, addressing challenges of traditional models.
Why It Matters
As data continues to grow exponentially, traditional modeling techniques struggle with scalability and efficiency. KMLP offers a novel solution that integrates advanced neural network architectures to improve performance on large datasets, making it relevant for data scientists and machine learning practitioners focused on big data applications.
Key Takeaways
- KMLP combines a shallow Kolmogorov-Arnold Network front-end with a Gated Multilayer Perceptron backbone for enhanced data modeling.
- The architecture effectively handles challenges like anisotropy and non-stationarity in large datasets.
- Experiments demonstrate KMLP's superior performance over traditional models, particularly as dataset sizes increase.
Computer Science > Machine Learning arXiv:2602.22777 (cs) [Submitted on 26 Feb 2026] Title:KMLP: A Scalable Hybrid Architecture for Web-Scale Tabular Data Modeling Authors:Mingming Zhang, Pengfei Shi, Zhiqing Xiao, Feng Zhao, Guandong Sun, Yulin Kang, Ruizhe Gao, Ningtao Wang, Xing Fu, Weiqiang Wang, Junbo Zhao View a PDF of the paper titled KMLP: A Scalable Hybrid Architecture for Web-Scale Tabular Data Modeling, by Mingming Zhang and 10 other authors View PDF HTML (experimental) Abstract:Predictive modeling on web-scale tabular data with billions of instances and hundreds of heterogeneous numerical features faces significant scalability challenges. These features exhibit anisotropy, heavy-tailed distributions, and non-stationarity, creating bottlenecks for models like Gradient Boosting Decision Trees and requiring laborious manual feature engineering. We introduce KMLP, a hybrid deep architecture integrating a shallow Kolmogorov-Arnold Network (KAN) front-end with a Gated Multilayer Perceptron (gMLP) backbone. The KAN front-end uses learnable activation functions to automatically model complex non-linear transformations for each feature, while the gMLP backbone captures high-order interactions. Experiments on public benchmarks and an industrial dataset with billions of samples show KMLP achieves state-of-the-art performance, with advantages over baselines like GBDTs increasing at larger scales, validating KMLP as a scalable deep learning paradigm for large-scale web tabu...