[2510.02625] TabImpute: Universal Zero-Shot Imputation for Tabular Data
Summary
The paper presents TabImpute, a pre-trained transformer model designed for zero-shot imputation of missing data in tabular formats, significantly improving speed and accuracy over existing methods.
Why It Matters
Missing data is a common challenge in data analysis, impacting the reliability of insights drawn from datasets. TabImpute addresses this issue by providing a fast, accurate solution that requires no hyperparameter tuning, making it accessible for various applications across domains like medicine and finance.
Key Takeaways
- TabImpute offers a zero-shot imputation method that eliminates the need for model fitting.
- The model achieves a 100x speed improvement over previous methods like TabPFN.
- It utilizes a synthetic training data generation pipeline to enhance performance on real-world datasets.
- MissBench, a new benchmark introduced in the study, evaluates TabImpute against 42 OpenML tables.
- TabImpute demonstrates robust performance across diverse domains, including finance and healthcare.
Computer Science > Machine Learning arXiv:2510.02625 (cs) [Submitted on 3 Oct 2025 (v1), last revised 17 Feb 2026 (this version, v4)] Title:TabImpute: Universal Zero-Shot Imputation for Tabular Data Authors:Jacob Feitelberg, Dwaipayan Saha, Kyuseong Choi, Zaid Ahmad, Anish Agarwal, Raaz Dwivedi View a PDF of the paper titled TabImpute: Universal Zero-Shot Imputation for Tabular Data, by Jacob Feitelberg and 5 other authors View PDF HTML (experimental) Abstract:Missing data is a widespread problem in tabular settings. Existing solutions range from simple averaging to complex generative adversarial networks, but due to each method's large variance in performance across real-world domains and time-consuming hyperparameter tuning, no universal imputation method exists. This performance variance is particularly pronounced in small datasets, where the models have the least amount of information. Building on TabPFN, a recent tabular foundation model for supervised learning, we propose TabImpute, a pre-trained transformer that delivers accurate and fast zero-shot imputations, requiring no fitting or hyperparameter tuning at inference time. To train and evaluate TabImpute, we introduce (i) an entry-wise featurization for tabular settings, enabling a 100x speedup over the previous TabPFN imputation method, (ii) a synthetic training data generation pipeline incorporating a diverse set of missingness patterns to enhance accuracy on real-world missing data problems, and (iii) MissBench...