A binary search approach to whole-genome data analysis

A sequence analysis-oriented binary search-like algorithm was transformed to a sensitive and accurate analysis tool for processing whole-genome data. The advantage of the algorithm over previous methods is its ability to detect the margins of both short and long genome fragments, enriched by up-regu...

Full description

Saved in:
Bibliographic Details
Published inProceedings of the National Academy of Sciences - PNAS Vol. 107; no. 39; pp. 16893 - 16898
Main Authors Brodsky, Leonid, Kogan, Simon, BenJacob, Eshel, Nevo, Eviatar
Format Journal Article
LanguageEnglish
Published United States National Academy of Sciences 28.09.2010
National Acad Sciences
Subjects
Online AccessGet full text
ISSN0027-8424
1091-6490
1091-6490
DOI10.1073/pnas.1011134107

Cover

More Information
Summary:A sequence analysis-oriented binary search-like algorithm was transformed to a sensitive and accurate analysis tool for processing whole-genome data. The advantage of the algorithm over previous methods is its ability to detect the margins of both short and long genome fragments, enriched by up-regulated signals, at equal accuracy. The score of an enriched genome fragment reflects the difference between the actual concentration of up-regulated signals in the fragment and the chromosome signal baseline. The "divide-and-conquer"-type algorithm detects a series of nonintersecting fragments of various lengths with locally optimal scores. The procedure is applied to detected fragments in a nested manner by recalculating the lower-than-baseline signals in the chromosome. The algorithm was applied to simulated whole-genome data, and its sensitivity/specificity were compared with those of several alternative algorithms. The algorithm was also tested with four biological tiling array datasets comprising Arabidopsis (i) expression and (ii) histone 3 lysine 27 trimethylation CHIP-on-chip datasets; Saccharomyces cerevisiae (iii) spliced intron data and (iv) chromatin remodeling factor binding sites. The analyses' results demonstrate the power of the algorithm in identifying both the short up-regulated fragments (such as exons and transcription factor binding sites) and the long—even moderately up-regulated zones—at their precise genome margins. The algorithm generates an accurate whole-genome landscape that could be used for cross-comparison of signals across the same genome in evolutionary and general genomic studies.
Bibliography:SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 14
ObjectType-Article-1
ObjectType-Feature-2
content type line 23
ObjectType-Article-2
Contributed by Eviatar Nevo, August 2, 2010 (sent for review February 6, 2010)
Author contributions: L.B. and E.N. designed research; L.B. and S.K. performed research; L.B., S.K., E.B., and E.N. analyzed data; and L.B., E.B., and E.N. wrote the paper.
ISSN:0027-8424
1091-6490
1091-6490
DOI:10.1073/pnas.1011134107