A comparison of reference-based algorithms for correcting cell-type heterogeneity in Epigenome-Wide Association Studies
Background Intra-sample cellular heterogeneity presents numerous challenges to the identification of biomarkers in large Epigenome-Wide Association Studies (EWAS). While a number of reference-based deconvolution algorithms have emerged, their potential remains underexplored and a comparative evaluat...
Saved in:
| Published in | BMC bioinformatics Vol. 18; no. 1; p. 105 |
|---|---|
| Main Authors | , , , |
| Format | Journal Article |
| Language | English |
| Published |
London
BioMed Central
13.02.2017
Springer Nature B.V |
| Subjects | |
| Online Access | Get full text |
| ISSN | 1471-2105 1471-2105 |
| DOI | 10.1186/s12859-017-1511-5 |
Cover
| Summary: | Background
Intra-sample cellular heterogeneity presents numerous challenges to the identification of biomarkers in large Epigenome-Wide Association Studies (EWAS). While a number of reference-based deconvolution algorithms have emerged, their potential remains underexplored and a comparative evaluation of these algorithms beyond tissues such as blood is still lacking.
Results
Here we present a novel framework for reference-based inference, which leverages cell-type specific DNAse Hypersensitive Site (DHS) information from the NIH Epigenomics Roadmap to construct an improved reference DNA methylation database. We show that this leads to a marginal but statistically significant improvement of cell-count estimates in whole blood as well as in mixtures involving epithelial cell-types. Using this framework we compare a widely used state-of-the-art reference-based algorithm (called constrained projection) to two non-constrained approaches including CIBERSORT and a method based on robust partial correlations. We conclude that the widely-used constrained projection technique may not always be optimal. Instead, we find that the method based on robust partial correlations is generally more robust across a range of different tissue types and for realistic noise levels. We call the combined algorithm which uses DHS data and robust partial correlations for inference, EpiDISH (
Epi
genetic
D
issection of
I
ntra-
S
ample
H
eterogeneity). Finally, we demonstrate the added value of EpiDISH in an EWAS of smoking.
Conclusions
Estimating cell-type fractions and subsequent inference in EWAS may benefit from the use of non-constrained reference-based cell-type deconvolution methods. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 |
| ISSN: | 1471-2105 1471-2105 |
| DOI: | 10.1186/s12859-017-1511-5 |