PeakPass: Automating ChIP-Seq Blacklist Creation

ChIP-Seq blacklists contain genomic regions that frequently produce artifacts and noise in ChIP-Seq experiments. To improve signal-to-noise ratio, ChIP-Seq pipelines often remove data points that map to blacklist regions. Existing blacklists have been compiled in a manual or semi-automated way. In t...

Full description

Saved in:
Bibliographic Details
Published inBioinformatics Research and Applications pp. 232 - 243
Main Authors Wimberley, Charles E., Heber, Steffen
Format Book Chapter
LanguageEnglish
Published Cham Springer International Publishing 2019
SeriesLecture Notes in Computer Science
Subjects
Online AccessGet full text
ISBN9783030202415
3030202410
ISSN0302-9743
1611-3349
DOI10.1007/978-3-030-20242-2_20

Cover

More Information
Summary:ChIP-Seq blacklists contain genomic regions that frequently produce artifacts and noise in ChIP-Seq experiments. To improve signal-to-noise ratio, ChIP-Seq pipelines often remove data points that map to blacklist regions. Existing blacklists have been compiled in a manual or semi-automated way. In this paper we describe PeakPass, an efficient method to generate blacklists, and present evidence that blacklists can increase ChIP-Seq data quality. PeakPass leverages machine learning and attempts to automate blacklist generation. PeakPass uses a random forest classifier in combination with genomic features such as sequence, annotated repeats, complexity, assembly gaps, and the ratio of multi-mapping to uniquely mapping reads to identify artifact regions. We have validated PeakPass on a large dataset and tested it for the purpose of upgrading a blacklist to a new reference genome version. We trained PeakPass on the ENCODE blacklist for the hg19 human reference genome, and created an updated blacklist for hg38. To assess the performance of this blacklist we tested 42 ChIP-Seq replicates from 24 experiments using the Relative Strand Correlation (RSC) metric as a quality measure. Using the blacklist generated by PeakPass resulted in a statistically significant increase in RSC over the existing ENCODE blacklist for hg38 – average RSC was increased by 50% over the ENCODE blacklist, while only filtering an average of 0.1% of called peaks.
ISBN:9783030202415
3030202410
ISSN:0302-9743
1611-3349
DOI:10.1007/978-3-030-20242-2_20