A greedy stacking algorithm for model ensembling and domain weighting

Objective Because it is impossible to know which statistical learning algorithm performs best on a prediction task, it is common to use stacking methods to ensemble individual learners into a more powerful single learner. Stacking algorithms are usually based on linear models, which may run into pro...

Full description

Saved in:

Bibliographic Details
Published in	BMC research notes Vol. 13; no. 1; pp. 70 - 6
Main Authors	Kurz, Christoph F., Maier, Werner, Rink, Christian
Format	Journal Article
Language	English
Published	London BioMed Central 12.02.2020 BioMed Central Ltd Springer Nature B.V BMC
Subjects	Algorithms Biomedical and Life Sciences Biomedicine Biostatistics - methods Classification Data mining Datasets Diabetes Diabetic retinopathy Estimates Greedy algorithm Humans Life Sciences Linear models (Statistics) Lung cancer Machine learning Medicine/Public Health Methods Model ensembling Models, Statistical Mortality Optimization Regression analysis Research Note Setting (Literature) Stacking Statistical analysis Germany Model ensembling Greedy algorithm Optimization Machine learning
Online Access	Get full text
ISSN	1756-0500 1756-0500
DOI	10.1186/s13104-020-4931-7

Cover

More Information
Summary:	Objective Because it is impossible to know which statistical learning algorithm performs best on a prediction task, it is common to use stacking methods to ensemble individual learners into a more powerful single learner. Stacking algorithms are usually based on linear models, which may run into problems, especially when predictions are highly correlated. In this study, we develop a greedy algorithm for model stacking that overcomes this issue while still being very fast and easy to interpret. We evaluate our greedy algorithm on 7 different data sets from various biomedical disciplines and compare it to linear stacking, genetic algorithm stacking and a brute force approach in different prediction settings. We further apply this algorithm on a task to optimize the weighting of the single domains (e.g., income, education) that build the German Index of Multiple Deprivation (GIMD) to be highly correlated with mortality. Results The greedy stacking algorithm provides good ensemble weights and outperforms the linear stacker in many tasks. Still, the brute force approach is slightly superior, but is computationally expensive. The greedy weighting algorithm has a variety of possible applications and is fast and efficient. A python implementation is provided.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	1756-0500 1756-0500
DOI:	10.1186/s13104-020-4931-7