[2510.06162] TabPFN-Wide: Continued Pre-Training for Extreme Feature Counts
About this article
Abstract page for arXiv paper 2510.06162: TabPFN-Wide: Continued Pre-Training for Extreme Feature Counts
Computer Science > Machine Learning arXiv:2510.06162 (cs) [Submitted on 7 Oct 2025 (v1), last revised 29 Mar 2026 (this version, v2)] Title:TabPFN-Wide: Continued Pre-Training for Extreme Feature Counts Authors:Christopher Kolberg, Jules Kreuer, Jonas Huurdeman, Sofiane Ouaari, Katharina Eggensperger, Nico Pfeifer View a PDF of the paper titled TabPFN-Wide: Continued Pre-Training for Extreme Feature Counts, by Christopher Kolberg and 4 other authors View PDF HTML (experimental) Abstract:Revealing novel insights from the relationship between molecular measurements and pathology remains a very impactful application of machine learning in biomedicine. Data in this domain typically contain only a few observations but thousands of potentially noisy features, posing challenges for conventional tabular machine learning approaches. While prior-data fitted networks emerge as foundation models for predictive tabular data tasks, they are currently not suited to handle large feature counts (>500). Although feature reduction enables their application, it hinders feature importance analysis. We propose a strategy that extends existing models through continued pre-training on synthetic data sampled from a customized prior. The resulting model, TabPFN-Wide, matches or exceeds its base model's performance, while exhibiting improved robustness to noise. It seamlessly scales beyond 30,000 categorical and continuous features, regardless of noise levels, while maintaining inherent interpretabi...