HIGEDA: a hierarchical gene-set genetics based algorithm for finding subtle motifs in biological sequences

Motivation: Identification of motifs in biological sequences is a challenging problem because such motifs are often short, degenerate, and may contain gaps. Most algorithms that have been developed for motif-finding use the expectation-maximization (EM) algorithm iteratively. Although EM algorithms...

Full description

Saved in:
Bibliographic Details
Published inBioinformatics Vol. 26; no. 3; pp. 302 - 309
Main Authors Le, Thanh, Altman, Tom, Gardiner, Katheleen
Format Journal Article
LanguageEnglish
Published Oxford Oxford University Press 01.02.2010
Subjects
Online AccessGet full text
ISSN1367-4803
1367-4811
1460-2059
1367-4811
DOI10.1093/bioinformatics/btp676

Cover

More Information
Summary:Motivation: Identification of motifs in biological sequences is a challenging problem because such motifs are often short, degenerate, and may contain gaps. Most algorithms that have been developed for motif-finding use the expectation-maximization (EM) algorithm iteratively. Although EM algorithms can converge quickly, they depend strongly on initialization parameters and can converge to local sub-optimal solutions. In addition, they cannot generate gapped motifs. The effectiveness of EM algorithms in motif finding can be improved by incorporating methods that choose different sets of initial parameters to enable escape from local optima, and that allow gapped alignments within motif models. Results: We have developed HIGEDA, an algorithm that uses the hierarchical gene-set genetic algorithm (HGA) with EM to initiate and search for the best parameters for the motif model. In addition, HIGEDA can identify gapped motifs using a position weight matrix and dynamic programming to generate an optimal gapped alignment of the motif model with sequences from the dataset. We show that HIGEDA outperforms MEME and other motif-finding algorithms on both DNA and protein sequences. Availability and implementation: Source code and test datasets are available for download at http://ouray.cudenver.edu/∼tnle/, implemented in C++ and supported on Linux and MS Windows. Contact: katheleen.gardiner@ucdenver.edu
Bibliography:ArticleID:btp676
Associate Editor: Limsoon Wong
To whom correspondence should be addressed.
istex:CE3913F3E147A6BA9A4B179BD6DBD927D49BAE14
ark:/67375/HXZ-GZTXLNF1-D
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1367-4803
1367-4811
1460-2059
1367-4811
DOI:10.1093/bioinformatics/btp676