Alternative empirical Bayes models for adjusting for batch effects in genomic studies

Background Combining genomic data sets from multiple studies is advantageous to increase statistical power in studies where logistical considerations restrict sample size or require the sequential generation of data. However, significant technical heterogeneity is commonly observed across multiple b...

Full description

Saved in:

Bibliographic Details
Published in	BMC bioinformatics Vol. 19; no. 1; pp. 262 - 15
Main Authors	Zhang, Yuqing, Jenkins, David F., Manimaran, Solaiappan, Johnson, W. Evan
Format	Journal Article
Language	English
Published	London BioMed Central 13.07.2018 BioMed Central Ltd Springer Nature B.V BMC
Subjects	Algorithms Batch effects Batch processing Bayes Theorem Bayesian analysis Bias Bioinformatics Biomarker development Biomarkers Biomedical and Life Sciences Comparative analysis Computational Biology/Bioinformatics Computer Appl. in Life Sciences Computer programs Computer simulation Data integration Data processing Decomposition DNA methylation Empirical analysis Empirical Bayes models Experiments Gene expression Genome-wide association studies Genomes Genomics Genomics - methods Heterogeneity Humans Life Sciences Lung cancer Methodology Methodology Article Microarrays Reagents Research Design Researchers Software Software development tools Studies Transcriptome analysis Biomarker development Empirical Bayes models Data integration Batch effects
Online Access	Get full text
ISSN	1471-2105 1471-2105
DOI	10.1186/s12859-018-2263-6

Cover

More Information
Summary:	Background Combining genomic data sets from multiple studies is advantageous to increase statistical power in studies where logistical considerations restrict sample size or require the sequential generation of data. However, significant technical heterogeneity is commonly observed across multiple batches of data that are generated from different processing or reagent batches, experimenters, protocols, or profiling platforms. These so-called batch effects often confound true biological relationships in the data, reducing the power benefits of combining multiple batches, and may even lead to spurious results in some combined studies. Therefore there is significant need for effective methods and software tools that account for batch effects in high-throughput genomic studies. Results Here we contribute multiple methods and software tools for improved combination and analysis of data from multiple batches. In particular, we provide batch effect solutions for cases where the severity of the batch effects is not extreme, and for cases where one high-quality batch can serve as a reference, such as the training set in a biomarker study. We illustrate our approaches and software in both simulated and real data scenarios. Conclusions We demonstrate the value of these new contributions compared to currently established approaches in the specified batch correction situations.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	1471-2105 1471-2105
DOI:	10.1186/s12859-018-2263-6