Variable selection for discriminant analysis with Markov random field priors for the analysis of microarray data

Motivation: Discriminant analysis is an effective tool for the classification of experimental units into groups. Here, we consider the typical problem of classifying subjects according to phenotypes via gene expression data and propose a method that incorporates variable selection into the inferenti...

Full description

Saved in:

Bibliographic Details
Published in	Bioinformatics (Oxford, England) Vol. 27; no. 4; pp. 495 - 501
Main Authors	Stingo, Francesco C., Vannucci, Marina
Format	Journal Article
Language	English
Published	Oxford Oxford University Press 15.02.2011
Subjects	Algorithms Bayes Theorem Biological and medical sciences Discriminant Analysis Fundamental and applied biological sciences. Psychology Gene Expression Profiling - methods Gene Regulatory Networks General aspects Markov Chains Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects) Oligonucleotide Array Sequence Analysis - methods Original Papers Stochastic Processes Random field Discriminant analysis Data analysis Data Microarray Selection
Online Access	Get full text
ISSN	1367-4803 1367-4811 1367-4811
DOI	10.1093/bioinformatics/btq690

Cover

More Information
Summary:	Motivation: Discriminant analysis is an effective tool for the classification of experimental units into groups. Here, we consider the typical problem of classifying subjects according to phenotypes via gene expression data and propose a method that incorporates variable selection into the inferential procedure, for the identification of the important biomarkers. To achieve this goal, we build upon a conjugate normal discriminant model, both linear and quadratic, and include a stochastic search variable selection procedure via an MCMC algorithm. Furthermore, we incorporate into the model prior information on the relationships among the genes as described by a gene–gene network. We use a Markov random field (MRF) prior to map the network connections among genes. Our prior model assumes that neighboring genes in the network are more likely to have a joint effect on the relevant biological processes. Results: We use simulated data to assess performances of our method. In particular, we compare the MRF prior to a situation where independent Bernoulli priors are chosen for the individual predictors. We also illustrate the method on benchmark datasets for gene expression. Our simulation studies show that employing the MRF prior improves on selection accuracy. In real data applications, in addition to identifying markers and improving prediction accuracy, we show how the integration of existing biological knowledge into the prior model results in an increased ability to identify genes with strong discriminatory power and also aids the interpretation of the results. Contact: marina@rice.edu
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 Associate Editor: Joaquin Dopazo
ISSN:	1367-4803 1367-4811 1367-4811
DOI:	10.1093/bioinformatics/btq690