A comparison of reference-based algorithms for correcting cell-type heterogeneity in Epigenome-Wide Association Studies

Background Intra-sample cellular heterogeneity presents numerous challenges to the identification of biomarkers in large Epigenome-Wide Association Studies (EWAS). While a number of reference-based deconvolution algorithms have emerged, their potential remains underexplored and a comparative evaluat...

Full description

Saved in:
Bibliographic Details
Published inBMC bioinformatics Vol. 18; no. 1; p. 105
Main Authors Teschendorff, Andrew E., Breeze, Charles E., Zheng, Shijie C., Beck, Stephan
Format Journal Article
LanguageEnglish
Published London BioMed Central 13.02.2017
Springer Nature B.V
Subjects
Online AccessGet full text
ISSN1471-2105
1471-2105
DOI10.1186/s12859-017-1511-5

Cover

More Information
Summary:Background Intra-sample cellular heterogeneity presents numerous challenges to the identification of biomarkers in large Epigenome-Wide Association Studies (EWAS). While a number of reference-based deconvolution algorithms have emerged, their potential remains underexplored and a comparative evaluation of these algorithms beyond tissues such as blood is still lacking. Results Here we present a novel framework for reference-based inference, which leverages cell-type specific DNAse Hypersensitive Site (DHS) information from the NIH Epigenomics Roadmap to construct an improved reference DNA methylation database. We show that this leads to a marginal but statistically significant improvement of cell-count estimates in whole blood as well as in mixtures involving epithelial cell-types. Using this framework we compare a widely used state-of-the-art reference-based algorithm (called constrained projection) to two non-constrained approaches including CIBERSORT and a method based on robust partial correlations. We conclude that the widely-used constrained projection technique may not always be optimal. Instead, we find that the method based on robust partial correlations is generally more robust across a range of different tissue types and for realistic noise levels. We call the combined algorithm which uses DHS data and robust partial correlations for inference, EpiDISH ( Epi genetic D issection of I ntra- S ample H eterogeneity). Finally, we demonstrate the added value of EpiDISH in an EWAS of smoking. Conclusions Estimating cell-type fractions and subsequent inference in EWAS may benefit from the use of non-constrained reference-based cell-type deconvolution methods.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:1471-2105
1471-2105
DOI:10.1186/s12859-017-1511-5