selectBoost: a general algorithm to enhance the performance of variable selection methods

Abstract Motivation With the growth of big data, variable selection has become one of the critical challenges in statistics. Although many methods have been proposed in the literature, their performance in terms of recall (sensitivity) and precision (predictive positive value) is limited in a contex...

Full description

Saved in:
Bibliographic Details
Published inBioinformatics Vol. 37; no. 5; pp. 659 - 668
Main Authors Bertrand, Frédéric, Aouadi, Ismaïl, Jung, Nicolas, Carapito, Raphael, Vallat, Laurent, Bahram, Seiamak, Maumy-Bertrand, Myriam
Format Journal Article
LanguageEnglish
Published England Oxford University Press 05.05.2021
Oxford University Press (OUP)
Subjects
Online AccessGet full text
ISSN1367-4803
1367-4811
1460-2059
1367-4811
DOI10.1093/bioinformatics/btaa855

Cover

More Information
Summary:Abstract Motivation With the growth of big data, variable selection has become one of the critical challenges in statistics. Although many methods have been proposed in the literature, their performance in terms of recall (sensitivity) and precision (predictive positive value) is limited in a context where the number of variables by far exceeds the number of observations or in a highly correlated setting. Results In this article, we propose a general algorithm, which improves the precision of any existing variable selection method. This algorithm is based on highly intensive simulations and takes into account the correlation structure of the data. Our algorithm can either produce a confidence index for variable selection or be used in an experimental design planning perspective. We demonstrate the performance of our algorithm on both simulated and real data. We then apply it in two different ways to improve biological network reverse-engineering. Availability and implementation Code is available as the SelectBoost package on the CRAN, https://cran.r-project.org/package=SelectBoost. Some network reverse-engineering functionalities are available in the Patterns CRAN package, https://cran.r-project.org/package=Patterns. Supplementary information Supplementary data are available at Bioinformatics online.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
Ismaïl Aouadi and Nicolas Jung authors wish it to be known that these authors contributed equally.
ISSN:1367-4803
1367-4811
1460-2059
1367-4811
DOI:10.1093/bioinformatics/btaa855