A linear time algorithm for detecting long genomic regions enriched with a specific combination of epigenetic states

Background Epigenetic modifications are essential for controlling gene expression. Recent studies have shown that not only single epigenetic modifications but also combinations of multiple epigenetic modifications play vital roles in gene regulation. A striking example is the long hypomethylated reg...

Full description

Saved in:
Bibliographic Details
Published inBMC genomics Vol. 16; no. Suppl 2; pp. S8 - 8
Main Authors Ichikawa, Kazuki, Morishita, Shinichi
Format Journal Article
LanguageEnglish
Published London BioMed Central 21.01.2015
Subjects
Online AccessGet full text
ISSN1471-2164
1471-2164
DOI10.1186/1471-2164-16-S2-S8

Cover

More Information
Summary:Background Epigenetic modifications are essential for controlling gene expression. Recent studies have shown that not only single epigenetic modifications but also combinations of multiple epigenetic modifications play vital roles in gene regulation. A striking example is the long hypomethylated regions enriched with modified H3K27me3 (called, "K27HMD" regions), which are exposed to suppress the expression of key developmental genes relevant to cellular development and differentiation during embryonic stages in vertebrates. It is thus a biologically important issue to develop an effective optimization algorithm for detecting long DNA regions ( e.g ., >4 kbp in size) that harbor a specific combination of epigenetic modifications ( e.g ., K27HMD regions). However, to date, optimization algorithms for these purposes have received little attention, and available methods are still heuristic and ad hoc . Results In this paper, we propose a linear time algorithm for calculating a set of non-overlapping regions that maximizes the sum of similarities between the vector of focal epigenetic states and the vectors of raw epigenetic states at DNA positions in the set of regions. The average elapsed time to process the epigenetic data of any of human chromosomes was less than 2 seconds on an Intel Xeon CPU. To demonstrate the effectiveness of the algorithm, we estimated large K27HMD regions in the medaka and human genomes using our method, ChromHMM, and a heuristic method. Conclusions We confirmed that the advantages of our method over those of the two other methods. Our method is flexible enough to handle other types of epigenetic combinations. The program that implements the method is called "CSMinfinder" and is made available at: http://mlab.cb.k.u-tokyo.ac.jp/~ichikawa/Segmentation/
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ObjectType-Article-2
ObjectType-Conference-1
ObjectType-Feature-3
SourceType-Conference Papers & Proceedings-2
ISSN:1471-2164
1471-2164
DOI:10.1186/1471-2164-16-S2-S8