[2507.21807] MIBoost: A Gradient Boosting Algorithm for Variable Selection After Multiple Imputation

arXiv - Machine Learning February 24, 2026 4 min read Article

Summary

MIBoost introduces a novel gradient boosting algorithm for variable selection after multiple imputation, addressing challenges in model selection with missing data.

Why It Matters

This research is significant as it provides a solution to the common problem of missing data in predictive modeling. By enhancing variable selection methods, MIBoost could improve the accuracy of predictions in various fields, making it a valuable tool for statisticians and data scientists dealing with incomplete datasets.

Key Takeaways

MIBoost offers a unified variable-selection mechanism across multiple imputed datasets.
The algorithm extends existing methods like LASSO and elastic nets to gradient boosting.
Simulation studies indicate MIBoost achieves comparable predictive performance to other advanced methods.
Addressing missing data effectively can enhance model reliability and insights.
The research contributes to ongoing discussions about optimal model selection techniques.

Statistics > Machine Learning arXiv:2507.21807 (stat) [Submitted on 29 Jul 2025 (v1), last revised 23 Feb 2026 (this version, v5)] Title:MIBoost: A Gradient Boosting Algorithm for Variable Selection After Multiple Imputation Authors:Robert Kuchen View a PDF of the paper titled MIBoost: A Gradient Boosting Algorithm for Variable Selection After Multiple Imputation, by Robert Kuchen View PDF HTML (experimental) Abstract:Statistical learning methods for automated variable selection, such as LASSO, elastic nets, or gradient boosting, have become increasingly popular tools for building powerful prediction models. Yet, in practice, analyses are often complicated by missing data. The most widely used approach to address missingness is multiple imputation, which involves creating several completed datasets. However, there is an ongoing debate on how to perform model selection in the presence of multiple imputed datasets. Simple strategies, such as pooling models across datasets, have been shown to have suboptimal properties. Although more sophisticated methods exist, they are often difficult to implement and therefore not widely applied. In contrast, two recent approaches modify the regularization methods LASSO and elastic nets by defining a single loss function, resulting in a unified set of coefficients across imputations. Our key contribution is to extend this principle to the framework of component-wise gradient boosting by proposing MIBoost, a novel algorithm that employs a u...

Read Original Article

[2507.21807] MIBoost: A Gradient Boosting Algorithm for Variable Selection After Multiple Imputation

Summary

Why It Matters

Key Takeaways

Related Articles

Claude Opus 4.6 API at 40% below Anthropic pricing – try free before you pay anything

[D] ICML reviewer making up false claim in acknowledgement, what to do?

UMKC Announces New Master of Science in Artificial Intelligence

[D] Budget Machine Learning Hardware

No comments

Stay updated with AI News