A Regression-Based Differential Expression Detection Algorithm for Microarray Studies with Ultra-Low Sample Size

Global gene expression analysis using microarrays and, more recently, RNA-seq, has allowed investigators to understand biological processes at a system level. However, the identification of differentially expressed genes in experiments with small sample size, high dimensionality, and high variance r...

Full description

Saved in:
Bibliographic Details
Published inPloS one Vol. 10; no. 3; p. e0118198
Main Authors Vasiliu, Daniel, Clamons, Samuel, McDonough, Molly, Rabe, Brian, Saha, Margaret
Format Journal Article
LanguageEnglish
Published United States Public Library of Science 04.03.2015
Public Library of Science (PLoS)
Subjects
Online AccessGet full text
ISSN1932-6203
1932-6203
DOI10.1371/journal.pone.0118198

Cover

More Information
Summary:Global gene expression analysis using microarrays and, more recently, RNA-seq, has allowed investigators to understand biological processes at a system level. However, the identification of differentially expressed genes in experiments with small sample size, high dimensionality, and high variance remains challenging, limiting the usability of these tens of thousands of publicly available, and possibly many more unpublished, gene expression datasets. We propose a novel variable selection algorithm for ultra-low-n microarray studies using generalized linear model-based variable selection with a penalized binomial regression algorithm called penalized Euclidean distance (PED). Our method uses PED to build a classifier on the experimental data to rank genes by importance. In place of cross-validation, which is required by most similar methods but not reliable for experiments with small sample size, we use a simulation-based approach to additively build a list of differentially expressed genes from the rank-ordered list. Our simulation-based approach maintains a low false discovery rate while maximizing the number of differentially expressed genes identified, a feature critical for downstream pathway analysis. We apply our method to microarray data from an experiment perturbing the Notch signaling pathway in Xenopus laevis embryos. This dataset was chosen because it showed very little differential expression according to limma, a powerful and widely-used method for microarray analysis. Our method was able to detect a significant number of differentially expressed genes in this dataset and suggest future directions for investigation. Our method is easily adaptable for analysis of data from RNA-seq and other global expression experiments with low sample size and high dimensionality.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
Competing Interests: The authors have declared that no competing interests exist.
Current Address: Department of Bioengineering, California Institute of Technology, Pasadena, California, United States of America
Conceived and designed the experiments: MS DV SC. Performed the experiments: BR MM DV SC. Analyzed the data: SC DV MS. Contributed reagents/materials/analysis tools: MS. Wrote the paper: MS SC DV. Edited the manuscript: MS SC DV MM BR.
Current Address: College of Medicine, University of Cincinnati, Cincinnati, Ohio, United States of America
Current Address: Department of Genetics, Harvard Medical School, Boston, Massachusetts, United States of America
ISSN:1932-6203
1932-6203
DOI:10.1371/journal.pone.0118198