Prediction of similarly acting cis-regulatory modules by subsequence profiling and comparative genomics in Drosophila melanogaster and D.pseudoobscura

Motivation: To date, computational searches for cis-regulatory modules (CRMs) have relied on two methods. The first, phylogenetic footprinting, has been used to find CRMs in non-coding sequence, but does not directly link DNA sequence with spatio-temporal patterns of expression. The second, based on...

Full description

Saved in:
Bibliographic Details
Published inBioinformatics Vol. 20; no. 16; pp. 2738 - 2750
Main Authors Grad, Yonatan H., Roth, Frederick P., Halfon, Marc S., Church, George M.
Format Journal Article
LanguageEnglish
Published Oxford Oxford University Press 01.11.2004
Oxford Publishing Limited (England)
Subjects
Online AccessGet full text
ISSN1367-4803
1367-4811
1460-2059
1367-4811
DOI10.1093/bioinformatics/bth320

Cover

More Information
Summary:Motivation: To date, computational searches for cis-regulatory modules (CRMs) have relied on two methods. The first, phylogenetic footprinting, has been used to find CRMs in non-coding sequence, but does not directly link DNA sequence with spatio-temporal patterns of expression. The second, based on searches for combinations of transcription factor (TF) binding motifs, has been employed in genome-wide discovery of similarly acting enhancers, but requires prior knowledge of the set of TFs acting at the CRM and the TFs' binding motifs. Results: We propose a method for CRM discovery that combines aspects of both approaches in an effort to overcome their individual limitations. By treating phylogenetically footprinted non-coding regions (PFRs) as proxies for CRMs, we endeavor to find PFRs near co-regulated genes that are comprised of similar short, conserved sequences. Using Markov chains as a convenient formulation to assess similarity, we develop a sampling algorithm to search a large group of PFRs for the most similar subset. When starting with a set of genes involved in Drosophila early blastoderm development and using phylogenetic comparisons of Drosophila melanogaster and D.pseudoobscura genomes, we show here that our algorithm successfully detects known CRMs. Further, we use our similarity metric, based on Markov chain discrimination, in a genome-wide search, and uncover additional known and many candidate early blastoderm CRMs. Availability: Software is available via http://arep.med.harvard.edu/enhancers Supplementary information: Can be accessed at http://arep.med.harvard.edu/enhancers
Bibliography:Contact: see http://arep.med.harvard.edu/email.html
istex:AD7C9A6806F9E0E9BEDFAA633ECABB2BC4A08C4C
ark:/67375/HXZ-BX2CPF7N-S
local:bth320
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ObjectType-Article-2
ObjectType-Feature-1
content type line 23
ObjectType-Undefined-3
ISSN:1367-4803
1367-4811
1460-2059
1367-4811
DOI:10.1093/bioinformatics/bth320