Integration of multiple epigenomic marks improves prediction of variant impact in saturation mutagenesis reporter assay

The integrative analysis of high‐throughput reporter assays, machine learning, and profiles of epigenomic chromatin state in a broad array of cells and tissues has the potential to significantly improve our understanding of noncoding regulatory element function and its contribution to human disease....

Full description

Saved in:
Bibliographic Details
Published inHuman mutation Vol. 40; no. 9; pp. 1280 - 1291
Main Authors Shigaki, Dustin, Adato, Orit, Adhikari, Aashish N., Dong, Shengcheng, Hawkins‐Hooker, Alex, Inoue, Fumitaka, Juven‐Gershon, Tamar, Kenlay, Henry, Martin, Beth, Patra, Ayoti, Penzar, Dmitry D., Schubach, Max, Xiong, Chenling, Yan, Zhongxia, Boyle, Alan P., Kreimer, Anat, Kulakovskiy, Ivan V., Reid, John, Unger, Ron, Yosef, Nir, Shendure, Jay, Ahituv, Nadav, Kircher, Martin, Beer, Michael A.
Format Journal Article
LanguageEnglish
Published United States John Wiley & Sons, Inc 01.09.2019
Subjects
Online AccessGet full text
ISSN1059-7794
1098-1004
1098-1004
DOI10.1002/humu.23797

Cover

More Information
Summary:The integrative analysis of high‐throughput reporter assays, machine learning, and profiles of epigenomic chromatin state in a broad array of cells and tissues has the potential to significantly improve our understanding of noncoding regulatory element function and its contribution to human disease. Here, we report results from the CAGI 5 regulation saturation challenge where participants were asked to predict the impact of nucleotide substitution at every base pair within five disease‐associated human enhancers and nine disease‐associated promoters. A library of mutations covering all bases was generated by saturation mutagenesis and altered activity was assessed in a massively parallel reporter assay (MPRA) in relevant cell lines. Reporter expression was measured relative to plasmid DNA to determine the impact of variants. The challenge was to predict the functional effects of variants on reporter expression. Comparative analysis of the full range of submitted prediction results identifies the most successful models of transcription factor binding sites, machine learning algorithms, and ways to choose among or incorporate diverse datatypes and cell‐types for training computational models. These results have the potential to improve the design of future studies on more diverse sets of regulatory elements and aid the interpretation of disease‐associated genetic variation.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ObjectType-Article-2
ObjectType-Feature-1
content type line 23
ISSN:1059-7794
1098-1004
1098-1004
DOI:10.1002/humu.23797