[2508.09888] Modern Neural Networks for Small Tabular Datasets: The New Default for Field-Scale Digital Soil Mapping?
Summary
This article presents a comprehensive evaluation of modern neural networks for small tabular datasets in the context of digital soil mapping, demonstrating their superiority over traditional machine learning methods.
Why It Matters
The findings challenge the long-standing reliance on classical machine learning in pedometrics, suggesting that modern neural networks can significantly enhance soil property predictions. This has implications for agricultural practices and environmental monitoring, making advanced machine learning techniques more accessible for field-scale applications.
Key Takeaways
- Modern ANNs outperform classical methods in soil property prediction.
- TabPFN is recommended as the new default model for pedometricians.
- The study evaluates 31 datasets, highlighting the robustness of modern architectures.
- Deep learning has matured to effectively handle small sample sizes in soil spectroscopy.
- This shift may influence future practices in digital soil mapping and agricultural modeling.
Computer Science > Machine Learning arXiv:2508.09888 (cs) [Submitted on 13 Aug 2025 (v1), last revised 24 Feb 2026 (this version, v2)] Title:Modern Neural Networks for Small Tabular Datasets: The New Default for Field-Scale Digital Soil Mapping? Authors:Viacheslav Barkov, Jonas Schmidinger, Robin Gebbers, Martin Atzmueller View a PDF of the paper titled Modern Neural Networks for Small Tabular Datasets: The New Default for Field-Scale Digital Soil Mapping?, by Viacheslav Barkov and 3 other authors View PDF HTML (experimental) Abstract:In the field of pedometrics, tabular machine learning is the predominant method for soil property prediction from remote and proximal soil sensing data, forming a central component of Digital Soil Mapping (DSM). At the field-scale, this predictive soil modeling (PSM) task is typically constrained by small training sample sizes and high feature-to-sample ratios in soil spectroscopy. Traditionally, these conditions have proven challenging for conventional deep learning methods. Classical machine learning algorithms, particularly tree-based models like Random Forest and linear models such as Partial Least Squares Regression, have long been the default choice for pedometric modeling within DSM. Recent advances in artificial neural networks (ANN) for tabular data challenge this view, yet their suitability for field-scale DSM has not been proven. We introduce a comprehensive benchmark that evaluates state-of-the-art ANN architectures, including the...