A binary search approach to whole-genome data analysis
A sequence analysis-oriented binary search-like algorithm was transformed to a sensitive and accurate analysis tool for processing whole-genome data. The advantage of the algorithm over previous methods is its ability to detect the margins of both short and long genome fragments, enriched by up-regu...
Saved in:
| Published in | Proceedings of the National Academy of Sciences - PNAS Vol. 107; no. 39; pp. 16893 - 16898 |
|---|---|
| Main Authors | , , , |
| Format | Journal Article |
| Language | English |
| Published |
United States
National Academy of Sciences
28.09.2010
National Acad Sciences |
| Subjects | |
| Online Access | Get full text |
| ISSN | 0027-8424 1091-6490 1091-6490 |
| DOI | 10.1073/pnas.1011134107 |
Cover
| Summary: | A sequence analysis-oriented binary search-like algorithm was transformed to a sensitive and accurate analysis tool for processing whole-genome data. The advantage of the algorithm over previous methods is its ability to detect the margins of both short and long genome fragments, enriched by up-regulated signals, at equal accuracy. The score of an enriched genome fragment reflects the difference between the actual concentration of up-regulated signals in the fragment and the chromosome signal baseline. The "divide-and-conquer"-type algorithm detects a series of nonintersecting fragments of various lengths with locally optimal scores. The procedure is applied to detected fragments in a nested manner by recalculating the lower-than-baseline signals in the chromosome. The algorithm was applied to simulated whole-genome data, and its sensitivity/specificity were compared with those of several alternative algorithms. The algorithm was also tested with four biological tiling array datasets comprising Arabidopsis (i) expression and (ii) histone 3 lysine 27 trimethylation CHIP-on-chip datasets; Saccharomyces cerevisiae (iii) spliced intron data and (iv) chromatin remodeling factor binding sites. The analyses' results demonstrate the power of the algorithm in identifying both the short up-regulated fragments (such as exons and transcription factor binding sites) and the long—even moderately up-regulated zones—at their precise genome margins. The algorithm generates an accurate whole-genome landscape that could be used for cross-comparison of signals across the same genome in evolutionary and general genomic studies. |
|---|---|
| Bibliography: | SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 14 ObjectType-Article-1 ObjectType-Feature-2 content type line 23 ObjectType-Article-2 Contributed by Eviatar Nevo, August 2, 2010 (sent for review February 6, 2010) Author contributions: L.B. and E.N. designed research; L.B. and S.K. performed research; L.B., S.K., E.B., and E.N. analyzed data; and L.B., E.B., and E.N. wrote the paper. |
| ISSN: | 0027-8424 1091-6490 1091-6490 |
| DOI: | 10.1073/pnas.1011134107 |