CMfinder—a covariance model based RNA motif finding algorithm

Motivation: The recent discoveries of large numbers of non-coding RNAs and computational advances in genome-scale RNA search create a need for tools for automatic, high quality identification and characterization of conserved RNA motifs that can be readily used for database search. Previous tools fa...

Full description

Saved in:
Bibliographic Details
Published inBioinformatics Vol. 22; no. 4; pp. 445 - 452
Main Authors Yao, Zizhen, Weinberg, Zasha, Ruzzo, Walter L.
Format Journal Article
LanguageEnglish
Published Oxford Oxford University Press 15.02.2006
Oxford Publishing Limited (England)
Subjects
Online AccessGet full text
ISSN1367-4803
1367-4811
1460-2059
1367-4811
DOI10.1093/bioinformatics/btk008

Cover

More Information
Summary:Motivation: The recent discoveries of large numbers of non-coding RNAs and computational advances in genome-scale RNA search create a need for tools for automatic, high quality identification and characterization of conserved RNA motifs that can be readily used for database search. Previous tools fall short of this goal. Results: CMfinder is a new tool to predict RNA motifs in unaligned sequences. It is an expectation maximization algorithm using covariance models for motif description, featuring novel integration of multiple techniques for effective search of motif space, and a Bayesian framework that blends mutual information-based and folding energy-based approaches to predict structure in a principled way. Extensive tests show that our method works well on datasets with either low or high sequence similarity, is robust to inclusion of lengthy extraneous flanking sequence and/or completely unrelated sequences, and is reasonably fast and scalable. In testing on 19 known ncRNA families, including some difficult cases with poor sequence conservation and large indels, our method demonstrates excellent average per-base-pair accuracy—79% compared with at most 60% for alternative methods. More importantly, the resulting probabilistic model can be directly used for homology search, allowing iterative refinement of structural models based on additional homologs. We have used this approach to obtain highly accurate covariance models of known RNA motifs based on small numbers of related sequences, which identified homologs in deeply-diverged species. Availability: Results and web server version are available at Contact: yzizhen@cs.washington.edu Supplementary information: Supplementary technical details are available at
Bibliography:istex:C0D2DF2CAA7B8880DF760E0D2721417AFCB9172A
To whom correspondence should be addressed.
Associate Editor: Thomas Lengauer
ark:/67375/HXZ-WHVQBMRM-8
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:1367-4803
1367-4811
1460-2059
1367-4811
DOI:10.1093/bioinformatics/btk008