A binary search approach to whole-genome data analysis

A sequence analysis-oriented binary search-like algorithm was transformed to a sensitive and accurate analysis tool for processing whole-genome data. The advantage of the algorithm over previous methods is its ability to detect the margins of both short and long genome fragments, enriched by up-regu...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings of the National Academy of Sciences - PNAS Vol. 107; no. 39; pp. 16893 - 16898
Main Authors	Brodsky, Leonid, Kogan, Simon, BenJacob, Eshel, Nevo, Eviatar
Format	Journal Article
Language	English
Published	United States National Academy of Sciences 28.09.2010 National Acad Sciences
Subjects	Accuracy Algorithms Arabidopsis Arabidopsis - genetics Binding sites Biological Sciences Biology chromatin Chromatin remodeling Chromosome Mapping - statistics & numerical data Chromosomes Comparative analysis data collection Data processing Evolution Exons Gene Expression Profiling - statistics & numerical data gene expression regulation Genome-Wide Association Study - statistics & numerical data Genomes Genomics Histones Introns Landscape Lysine Methylation Oligonucleotide Array Sequence Analysis - statistics & numerical data RNA Splicing Saccharomyces cerevisiae Saccharomyces cerevisiae - genetics Sequence Analysis, DNA - methods Signal detection Signal noise Tiling transcription (genetics) Transcription factors
Online Access	Get full text
ISSN	0027-8424 1091-6490 1091-6490
DOI	10.1073/pnas.1011134107

Cover

More Information
Summary:	A sequence analysis-oriented binary search-like algorithm was transformed to a sensitive and accurate analysis tool for processing whole-genome data. The advantage of the algorithm over previous methods is its ability to detect the margins of both short and long genome fragments, enriched by up-regulated signals, at equal accuracy. The score of an enriched genome fragment reflects the difference between the actual concentration of up-regulated signals in the fragment and the chromosome signal baseline. The "divide-and-conquer"-type algorithm detects a series of nonintersecting fragments of various lengths with locally optimal scores. The procedure is applied to detected fragments in a nested manner by recalculating the lower-than-baseline signals in the chromosome. The algorithm was applied to simulated whole-genome data, and its sensitivity/specificity were compared with those of several alternative algorithms. The algorithm was also tested with four biological tiling array datasets comprising Arabidopsis (i) expression and (ii) histone 3 lysine 27 trimethylation CHIP-on-chip datasets; Saccharomyces cerevisiae (iii) spliced intron data and (iv) chromatin remodeling factor binding sites. The analyses' results demonstrate the power of the algorithm in identifying both the short up-regulated fragments (such as exons and transcription factor binding sites) and the long—even moderately up-regulated zones—at their precise genome margins. The algorithm generates an accurate whole-genome landscape that could be used for cross-comparison of signals across the same genome in evolutionary and general genomic studies.
Bibliography:	SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 14 ObjectType-Article-1 ObjectType-Feature-2 content type line 23 ObjectType-Article-2 Contributed by Eviatar Nevo, August 2, 2010 (sent for review February 6, 2010) Author contributions: L.B. and E.N. designed research; L.B. and S.K. performed research; L.B., S.K., E.B., and E.N. analyzed data; and L.B., E.B., and E.N. wrote the paper.
ISSN:	0027-8424 1091-6490 1091-6490
DOI:	10.1073/pnas.1011134107