Variable selection for discriminant analysis with Markov random field priors for the analysis of microarray data

Motivation: Discriminant analysis is an effective tool for the classification of experimental units into groups. Here, we consider the typical problem of classifying subjects according to phenotypes via gene expression data and propose a method that incorporates variable selection into the inferenti...

Full description

Saved in:
Bibliographic Details
Published inBioinformatics (Oxford, England) Vol. 27; no. 4; pp. 495 - 501
Main Authors Stingo, Francesco C., Vannucci, Marina
Format Journal Article
LanguageEnglish
Published Oxford Oxford University Press 15.02.2011
Subjects
Online AccessGet full text
ISSN1367-4803
1367-4811
1367-4811
DOI10.1093/bioinformatics/btq690

Cover

More Information
Summary:Motivation: Discriminant analysis is an effective tool for the classification of experimental units into groups. Here, we consider the typical problem of classifying subjects according to phenotypes via gene expression data and propose a method that incorporates variable selection into the inferential procedure, for the identification of the important biomarkers. To achieve this goal, we build upon a conjugate normal discriminant model, both linear and quadratic, and include a stochastic search variable selection procedure via an MCMC algorithm. Furthermore, we incorporate into the model prior information on the relationships among the genes as described by a gene–gene network. We use a Markov random field (MRF) prior to map the network connections among genes. Our prior model assumes that neighboring genes in the network are more likely to have a joint effect on the relevant biological processes. Results: We use simulated data to assess performances of our method. In particular, we compare the MRF prior to a situation where independent Bernoulli priors are chosen for the individual predictors. We also illustrate the method on benchmark datasets for gene expression. Our simulation studies show that employing the MRF prior improves on selection accuracy. In real data applications, in addition to identifying markers and improving prediction accuracy, we show how the integration of existing biological knowledge into the prior model results in an increased ability to identify genes with strong discriminatory power and also aids the interpretation of the results. Contact:  marina@rice.edu
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
Associate Editor: Joaquin Dopazo
ISSN:1367-4803
1367-4811
1367-4811
DOI:10.1093/bioinformatics/btq690