[2602.23182] Closing the gap on tabular data with Fourier and Implicit Categorical Features
Summary
This paper explores how deep learning can better handle tabular data by addressing its limitations compared to tree-based methods, particularly through feature preprocessing techniques.
Why It Matters
Deep learning has struggled with tabular data, often yielding inferior results compared to traditional methods like XGBoost. This research proposes innovative solutions to enhance deep learning's performance, potentially transforming how machine learning practitioners approach tabular datasets.
Key Takeaways
- Deep learning models traditionally underperform on tabular data compared to tree-based methods.
- The study introduces statistical feature processing to improve model performance.
- Learned Fourier techniques help mitigate biases in deep learning models.
- The proposed methods can achieve or exceed the performance of XGBoost.
- This research could influence future approaches to handling tabular data in machine learning.
Computer Science > Machine Learning arXiv:2602.23182 (cs) [Submitted on 26 Feb 2026] Title:Closing the gap on tabular data with Fourier and Implicit Categorical Features Authors:Marius Dragoi, Florin Gogianu, Elena Burceanu View a PDF of the paper titled Closing the gap on tabular data with Fourier and Implicit Categorical Features, by Marius Dragoi and 2 other authors View PDF HTML (experimental) Abstract:While Deep Learning has demonstrated impressive results in applications on various data types, it continues to lag behind tree-based methods when applied to tabular data, often referred to as the last "unconquered castle" for neural networks. We hypothesize that a significant advantage of tree-based methods lies in their intrinsic capability to model and exploit non-linear interactions induced by features with categorical characteristics. In contrast, neural-based methods exhibit biases toward uniform numerical processing of features and smooth solutions, making it challenging for them to effectively leverage such patterns. We address this performance gap by using statistical-based feature processing techniques to identify features that are strongly correlated with the target once discretized. We further mitigate the bias of deep models for overly-smooth solutions, a bias that does not align with the inherent properties of the data, using Learned Fourier. We show that our proposed feature preprocessing significantly boosts the performance of deep learning models and enab...