A neutral comparison of algorithms to minimize L0 penalties for high‐dimensional variable selection

Variable selection methods based on L0 penalties have excellent theoretical properties to select sparse models in a high‐dimensional setting. There exist modifications of the Bayesian Information Criterion (BIC) which either control the familywise error rate (mBIC) or the false discovery rate (mBIC2...

Full description

Saved in:

Bibliographic Details
Published in	Biometrical journal Vol. 66; no. 1; pp. e2200207 - n/a
Main Author	Frommlet, Florian
Format	Journal Article
Language	English
Published	Weinheim Wiley - VCH Verlag GmbH & Co. KGaA 01.01.2024
Subjects	Algorithms Bayesian analysis Convexity Criteria Feature selection Fines & penalties Gene mapping high‐dimensional data L0 penalties Mathematical models neutral comparison Quantitative trait loci Statistical analysis Statistical methods variable selection
Online Access	Get full text
ISSN	0323-3847 1521-4036 1521-4036
DOI	10.1002/bimj.202200207

Cover

More Information
Summary:	Variable selection methods based on L0 penalties have excellent theoretical properties to select sparse models in a high‐dimensional setting. There exist modifications of the Bayesian Information Criterion (BIC) which either control the familywise error rate (mBIC) or the false discovery rate (mBIC2) in terms of which regressors are selected to enter a model. However, the minimization of L0 penalties comprises a mixed‐integer problem which is known to be NP‐hard and therefore becomes computationally challenging with increasing numbers of regressor variables. This is one reason why alternatives like the LASSO have become so popular, which involve convex optimization problems that are easier to solve. The last few years have seen some real progress in developing new algorithms to minimize L0 penalties. The aim of this article is to compare the performance of these algorithms in terms of minimizing L0‐based selection criteria. Simulation studies covering a wide range of scenarios that are inspired by genetic association studies are used to compare the values of selection criteria obtained with different algorithms. In addition, some statistical characteristics of the selected models and the runtime of algorithms are compared. Finally, the performance of the algorithms is illustrated in a real data example concerned with expression quantitative trait loci (eQTL) mapping.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	0323-3847 1521-4036 1521-4036
DOI:	10.1002/bimj.202200207