Bag of Naïve Bayes: biomarker selection and classification from genome-wide SNP data

Background Multifactorial diseases arise from complex patterns of interaction between a set of genetic traits and the environment. To fully capture the genetic biomarkers that jointly explain the heritability component of a disease, thus, all SNPs from a genome-wide association study should be analy...

Full description

Saved in:

Bibliographic Details
Published in	BMC bioinformatics Vol. 13; no. Suppl 14; p. S2
Main Authors	Sambo, Francesco, Trifoglio, Emanuele, Di Camillo, Barbara, Toffolo, Gianna M, Cobelli, Claudio
Format	Journal Article
Language	English
Published	London BioMed Central 07.09.2012 Springer Nature B.V
Subjects	Algorithms Bayes Theorem Bayesian analysis Bioinformatics Biomarkers Biomedical and Life Sciences Case-Control Studies Classification Computational Biology/Bioinformatics Computer Appl. in Life Sciences Conferences Data processing Design Diabetes mellitus Diabetes Mellitus, Type 1 - genetics Disease Female Genetic Markers Genetics Genome-Wide Association Study Genomes Heritability Humans Life Sciences Logistic Models Microarrays Polymorphism, Single Nucleotide Probability Random variables Single-nucleotide polymorphism Studies Ranking Step Classification Performance Marginal Utility High Classification Performance Total Computational Complexity
Online Access	Get full text
ISSN	1471-2105 1471-2105
DOI	10.1186/1471-2105-13-S14-S2

Cover

More Information
Summary:	Background Multifactorial diseases arise from complex patterns of interaction between a set of genetic traits and the environment. To fully capture the genetic biomarkers that jointly explain the heritability component of a disease, thus, all SNPs from a genome-wide association study should be analyzed simultaneously. Results In this paper, we present Bag of Naïve Bayes (BoNB), an algorithm for genetic biomarker selection and subjects classification from the simultaneous analysis of genome-wide SNP data. BoNB is based on the Naïve Bayes classification framework, enriched by three main features: bootstrap aggregating of an ensemble of Naïve Bayes classifiers, a novel strategy for ranking and selecting the attributes used by each classifier in the ensemble and a permutation-based procedure for selecting significant biomarkers, based on their marginal utility in the classification process. BoNB is tested on the Wellcome Trust Case-Control study on Type 1 Diabetes and its performance is compared with the ones of both a standard Naïve Bayes algorithm and HyperLASSO, a penalized logistic regression algorithm from the state-of-the-art in simultaneous genome-wide data analysis. Conclusions The significantly higher classification accuracy obtained by BoNB, together with the significance of the biomarkers identified from the Type 1 Diabetes dataset, prove the effectiveness of BoNB as an algorithm for both classification and biomarker selection from genome-wide SNP data. Availability Source code of the BoNB algorithm is released under the GNU General Public Licence and is available at http://www.dei.unipd.it/~sambofra/bonb.html .
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 ObjectType-Article-2 ObjectType-Feature-1 content type line 23 ObjectType-Conference-3 SourceType-Conference Papers & Proceedings-2
ISSN:	1471-2105 1471-2105
DOI:	10.1186/1471-2105-13-S14-S2